LT Corpus

The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PS corpus (Post-Scriptum)-PT

PS Corpus (Post-Scriptum)-PT is a corpus of 2215 informal mail letters written in Portuguese during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speec...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FLY corpus - morpho

FLY Corpus is a corpus composed by 2000 informal letters written in Portuguese, in the years spanning from 1900 to 1974, in the context of war, migration, imprisonment and exile. Each letter is in an XML file with two main parts: (a) the header, which contains metadata about the document (the ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FLY corpus - syntax

FLY Corpus is a corpus composed by 2000 informal letters written in Portuguese, in the years spanning from 1900 to 1974, in the context of war, migration, imprisonment and exile. Each letter is in an XML file with two main parts: (a) the header, which contains metadata about the document (the ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PS corpus (Post-Scriptum)-ES

PS Corpus (Post-Scriptum)-ES is a corpus of 2368 informal mail letters written in Spanish during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speech a...

Resource Type:Corpus
Media Type:Text
Language:Spanish; Castilian
PTPARL Corpus

The PTPARL Corpus contains approximately 975,806 running words of European Portuguese. It includes 1076 texts consisting of adapted transcriptions of the Portuguese parliament sessions, which were made available in 2004.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
U-Compare Type system

The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...

Resource Type:Language Description
Media Type:Text
Language:English
UIMA/U-Compare Apertium POS Tagger

This tool assigns a part-of-speech tag and base form to each token in a text. It operates on text that has previously been tokenised and morphologically analysed. The POS tagger is a module of Apertium machine translation system. The provided tool can currently operate on a subset of the language...

Resource Type:Tool / Service
Languages:Basque
Catalan
English
Galician
Portuguese
Spanish
UIMA/U-Compare Apertium Morphological Analyser

This tool performs tokenization of text and assigns all possible morphological analyses to each token. These analyses include the base form of the token, part-of-speech, information about number and gender. The morphological analyser is a module of Apertium machine translation system. The provide...

Resource Type:Tool / Service
Languages:Basque
Catalan
English
Galician
Portuguese
Spanish
UIMA/U-Compare GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...

Resource Type:Tool / Service
Language:English