PAROLE Portuguese Lexicon

The resource is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Treat

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...

Resource Type:Tool / Service
U-Compare Discourse Parsing Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Performs discourse parsing on plain text. Also identifies sentences, tokens, parts of speech, lemmas, clauses and coreference chains Tools in workflow: UAIC-POSTagger, UAIC-NPChunker, UAI...

Resource Type:Tool / Service
Language:Romanian
U-Compare Segmentation Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies clauses/segments in plain text. Also identifies sentences, tokens, POS tags and lemmas. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), TTL Tokenizer...

Resource Type:Tool / Service
Language:Romanian
U-Compare Part-of-Speech Tagging service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies tokens in plain text and assigns parts-of-speech Tools in workflow: MLRS POS Tagger web service (University of Malta) NOTE: The licence provided covers the web service only. To...

Resource Type:Tool / Service
Language:Maltese
LT Corpus

The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
UIMA/U-Compare STEPP Tagger

Part-of-speech tagger tuned to biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Compare Workbench (se...

Resource Type:Tool / Service
Language:English
U-Compare Apertium Part-of-Speech Tagging Workflow

This is a workflow that is designed especially for use in the UIMA-based U-Compare workbench (see separate META-SHARE record). The workflow is in "ucz" format (specific to U-Compare) and can be imported via the "Import Workflow" item in the "Workflows" menu of the U-Compare interface. It include...

Resource Type:Tool / Service
Languages:Basque
Catalan; Valencian
English
Galician
Portuguese
Spanish; Castilian
EUROPARL Corpus Parallel Corpora: Portuguese-English

The EUROPARL Corpus (subpart Portuguese-English of the parallel corpora), available at http://www.statmt.org/europarl/, was extracted from the proceedings of the European Parliament (Koehn, 2005). It contains transcriptions of sessions dating back from 1996 to 2011, in a total of approximately 58...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
UIMA/U-Compare GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...

Resource Type:Tool / Service
Language:English