The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies sentences and tokens in plain text. Parts of speech and lemmas are assigned to tokens. Language is automatically identified amongst the supported languages and language-specific ...