The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.
The EUROPARL Corpus (subpart Portuguese-English of the parallel corpora), available at http://www.statmt.org/europarl/, was extracted from the proceedings of the European Parliament (Koehn, 2005). It contains transcriptions of sessions dating back from 1996 to 2011, in a total of approximately 58...
This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.
Filter by:
Written Language (8)
Spoken Language (1)
Lemmatization (13)
Human Use (4)
Pos Tagging (11)
Text Mining (5)
Parsing (4)
Annotation (3)
Lexicon Access (3)
Event Extraction (2)
Other (1)
Speech Analysis (1)