HIMERA Corpus

The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...

Resource Type:Corpus
Media Type:Text
Language:English
Treat

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...

Resource Type:Tool / Service
PhenoCHF Corpus

PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...

Resource Type:Corpus
Media Type:Text
Language:English
U-Compare Workbench

The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...

Resource Type:Tool / Service
UIMA/U-Compare GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...

Resource Type:Tool / Service
Language:English
UIMA/U-Compare NEMine

The purpose of the tool is to identify gene and protein names in biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform for building and evaluating text mining workflows. The U-Compare Workbench pr...

Resource Type:Tool / Service
Language:English
A Terminological Inventory for Biodiversity

In order to construct the inventory, we firstly compiled a species name dictionary by combining all of the names available in Catalogue of Life (CoL), Encyclopedia of Life (EoL) and Global Biodiversity Information Facility (GBIF). The terms contained in this dictionary were then located within ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:English
U-Compare Named Entity Recognition service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biomedical named entities (genes and proteins) in plain text. Also identifies sentences. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), NEMine (Univ...

Resource Type:Tool / Service
Language:English
GENIA POS & Term Corpus

A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS anno...

Resource Type:Corpus
Media Type:Text
Language:English
NoSta-D: German NER Dataset Train/Dev

Freely available large dataset, manually annotated for German NER. Includes nested span annotations. Source text from German Wikipedia and news. This data set does not contain the test data, which is used for the GermEval 2014 NER task at KONVENS. Test data will be available from September 2014.

Resource Type:Corpus
Media Type:Text
Language:German