The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...
PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...
The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...
The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...
The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...
Yake! (Campos et al. 2020) is a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike most of the systems, Yake! does not rely on dictionaries nor thesauri, neither is trained against any corpora. Instead, we follow a...