A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS anno...
The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...