An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis

An Arabic twitter data set of 7,503 tweets. The released data contains manual Sentiment Analysis annotations as well as automatically extracted features, saved in Comma Separated (CSV) and Attribute-Relation File Format (ARFF) file formats. Due to twitter privacy restrictions we replaced the orig...

Resource Type:Corpus
Media Type:Text
Language:Arabic
U-Compare Apertium Part-of-Speech Tagging Workflow

This is a workflow that is designed especially for use in the UIMA-based U-Compare workbench (see separate META-SHARE record). The workflow is in "ucz" format (specific to U-Compare) and can be imported via the "Import Workflow" item in the "Workflows" menu of the U-Compare interface. It include...

Resource Type:Tool / Service
Languages:Basque
Catalan; Valencian
English
Galician
Portuguese
Spanish; Castilian
U-Compare Cafetiere English Sentence Detector

The purpose of the tool is to detect sentence boundaries in English text. The tool is provided as a UIMA component, specifically as Java archive (jar) file, which can be incorporated within any UIMA workflow. However, it is particularly designed use in the U-Compare text mining platform (see sepa...

Resource Type:Tool / Service
Language:English
GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.

Resource Type:Tool / Service
Language:English
PhenoCHF Corpus

PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...

Resource Type:Corpus
Media Type:Text
Language:English
HIMERA Corpus

The HIMERA annotated corpus contains a set of published historical medical documents that have been manually annotated with semantic information that is relevant to the study of medical history and public health. Specifically, annotations correspond to seven different entity types and two differe...

Resource Type:Corpus
Media Type:Text
Language:English
Enju parser

Enju is a syntactic parser for English. The grammar used by the parser is based on Head Driven Phrase Structure Grammar (HPSG). Enju can analyse syntactic/semantic structures of English sentences can output phrase structure and predicate-argument structures.

Resource Type:Tool / Service
Language:English
UIMA/U-Compare STEPP Tagger

Part-of-speech tagger tuned to biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Compare Workbench (se...

Resource Type:Tool / Service
Language:English
UIMA/U-Compare GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...

Resource Type:Tool / Service
Language:English
UIMA/U-Compare Enju parser

Syntactic parser for English. Outputs predicate-argument structures. Also outputs base forms for each token. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and...

Resource Type:Tool / Service
Language:English

Order by:

Filter by: