PhenoCHF Corpus

PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...

Resource Type:Corpus
Media Type:Text
Language:English
A Terminological Inventory for Biodiversity

In order to construct the inventory, we firstly compiled a species name dictionary by combining all of the names available in Catalogue of Life (CoL), Encyclopedia of Life (EoL) and Global Biodiversity Information Facility (GBIF). The terms contained in this dictionary were then located within ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:English
STEPP Tagger

Part-of-speech tagger tuned to biomedical text, provided as a web service.

Resource Type:Tool / Service
Language:English
GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.

Resource Type:Tool / Service
Language:English
Enju parser

Enju is a syntactic parser for English. The grammar used by the parser is based on Head Driven Phrase Structure Grammar (HPSG). Enju can analyse syntactic/semantic structures of English sentences can output phrase structure and predicate-argument structures.

Resource Type:Tool / Service
Language:English
HIMERA Corpus

The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...

Resource Type:Corpus
Media Type:Text
Language:English
GENIA Event Corpus with meta-knowledge annotation

The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...

Resource Type:Corpus
Media Type:Text
Language:English
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
UIMA/U-Compare STEPP Tagger

Part-of-speech tagger tuned to biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Compare Workbench (se...

Resource Type:Tool / Service
Language:English
UIMA/U-Compare Stanford Parser

Syntactic parser for English. Outputs dependency relations. Also outputs parts-of-speech for each token. The tool is provided as a UIMA component, specifically as Java archive (jar) file, which can be incorporated within any UIMA workflow. However, it is particularly designed use in the U-Com...

Resource Type:Tool / Service
Language:English

Order by:

Filter by: