Tokenisation is one of the functionalities of the GENIA tagger, which additionally outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is a UIMA component, which forms part of th...
Part-of-speech tagger tuned to biomedical text, provided as a web service.
Syntactic parser for English. Outputs dependency relations. Also outputs parts-of-speech for each token. The tool is provided as a UIMA component, specifically as Java archive (jar) file, which can be incorporated within any UIMA workflow. However, it is particularly designed use in the U-Com...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies sentences in plain text Tools in workflow: Freeling sentence splitter web service (service provided by the PANACEA project) NOTE: The licence provided covers the web service o...
The purpose of the tool is to detect sentence boundaries in English text. The tool is provided as a UIMA component, specifically as Java archive (jar) file, which can be incorporated within any UIMA workflow. However, it is particularly designed use in the U-Compare text mining platform (see sepa...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biological named entities and disambiguates them according to species, by assigning a species ID from the NCBI taxonomy. Also identifies sentences and tokens. Tools in workflow...
The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...
This inventory contains a set of terms that are relevant to the study of medical history. The inventory is organised as a set of "heading terms", belonging to one of seven different semantic categories, each of which is accompanied by a set of semantically-related terms. There are around 175,0...
PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies and categorises syntactic chunks in plain text Tools in workflow: Freeling shallow parser web service (service provided by the PANACEA project) NOTE: The licence provided cove...