U-Compare Named Entity Recognition service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biomedical named entities (genes and proteins) in plain text. Also identifies sentences. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), NEMine (Univ...

Resource Type:Tool / Service
Language:English
NoSta-D: German NER Dataset Train/Dev

Freely available large dataset, manually annotated for German NER. Includes nested span annotations. Source text from German Wikipedia and news. This data set does not contain the test data, which is used for the GermEval 2014 NER task at KONVENS. Test data will be available from September 2014.

Resource Type:Corpus
Media Type:Text
Language:German
PhenoCHF Corpus

PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...

Resource Type:Corpus
Media Type:Text
Language:English
GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.

Resource Type:Tool / Service
Language:English
GENIA POS & Term Corpus

A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS anno...

Resource Type:Corpus
Media Type:Text
Language:English
GENIA Event Corpus with meta-knowledge annotation

The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...

Resource Type:Corpus
Media Type:Text
Language:English
U-Compare Species Disambiguation Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biological named entities and disambiguates them according to species, by assigning a species ID from the NCBI taxonomy. Also identifies sentences and tokens. Tools in workflow...

Resource Type:Tool / Service
Language:English
U-Compare Workbench

The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...

Resource Type:Tool / Service
U-Compare Type system

The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...

Resource Type:Language Description
Media Type:Text
Language:English
HIMERA Corpus

The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...

Resource Type:Corpus
Media Type:Text
Language:English