Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies sentences and tokens in plain text. Parts of speech and lemmas are assigned to tokens. Language is automatically identified amongst the supported languages and language-specific ...
The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...
Os documentos em português da Chancelaria de D. Afonso III constituem o primeiro conjunto significativo de textos em português (34 documentos que recobrem um período de 24 anos: 1255 - 1279), sendo apenas a partir de 1279, com D. Dinis (1261-1325), que se inicia o uso sistemático do português co...
PS Corpus (Post-Scriptum)-PT is a corpus of 2215 informal mail letters written in Portuguese during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speec...
The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...
CINTIL-DeepBank (Branco et al., 2010) is a corpus of Portuguese texts annotated with deep grammatical information. This document refers to version 1.4 of the corpus, from January 2016, which adds over 15,400 annotated sentences to the previous version from September 2015. The current version i...
DiZer 2.0 is a web interface for discourse parsing. It is based on DiZer (Pardo and Nunes, 2008), the first discourse parser for Brazilian Portuguese. The system aims at producing the discourse structure of a source text following the Rhetorical Structure Theory – RST (Mann and Thompson, 1987), o...
MSTParser is a non-projective dependency parser (see McDonald et al., 2005a, 2006) that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods (see McDonald et al., 2005b). Projective parsing is also suppo...
The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.
MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).