GENIA POS & Term Corpus

A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS anno...

Resource Type:Corpus
Media Type:Text
Language:English
Laws of Malta - English

The corpus contains the Laws of Malta in English from the official government website. The unannotated raw text files were extracted from the pdf files that can be found on the website.

Resource Type:Corpus
Media Type:Text
Language:English
UIMA/U-Compare STEPP Tagger

Part-of-speech tagger tuned to biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Compare Workbench (se...

Resource Type:Tool / Service
Language:English
GENIA Event Corpus with meta-knowledge annotation

The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...

Resource Type:Corpus
Media Type:Text
Language:English
GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.

Resource Type:Tool / Service
Language:English
PhenoCHF Corpus

PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...

Resource Type:Corpus
Media Type:Text
Language:English
U-Compare Named Entity Recognition service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biomedical named entities (genes and proteins) in plain text. Also identifies sentences. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), NEMine (Univ...

Resource Type:Tool / Service
Language:English
Enju parser

Enju is a syntactic parser for English. The grammar used by the parser is based on Head Driven Phrase Structure Grammar (HPSG). Enju can analyse syntactic/semantic structures of English sentences can output phrase structure and predicate-argument structures.

Resource Type:Tool / Service
Language:English
GREC

GREC is a semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) which is intended for training IE systems and/or resources which are used to extract events from biomedical literature.

Resource Type:Corpus
Media Type:Text
Language:English
TakeLab Vectors

This resource includes the distributional semantic vectors used for the replication of the TakeLab system (https://github.com/nlx-group/arct-rep-rev). The TakeLab system is an automatic classifier for the Argument Reasoning Comprehension Task (https://www.aclweb.org/anthology/S18-1121/). The ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:English

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)