PS corpus (Post-Scriptum) - treebank

PS corpus (Post-Scriptum) - treebank is a treebank corpus, in Portuguese and Spanish, with 586 informal mail letters (XVIth century to the beginning of the XIXth century).

Resource Type:Corpus
Media Type:Text
Languages:Portuguese
Spanish; Castilian
LX-Chunker

The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...

Resource Type:Tool / Service
Language:Portuguese
LX-Abbreviations

LX-Abbreviations resource is a collection of abbreviations of different types from European Portuguese composed by 208 words. Each type of abbreviation is manually divided and annotated with grammatical categories, gender and number, and, finally, with the respective abbreviations.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
CINTIL-TreeBank

The CINTIL-TreeBank (Branco et al., 2011) is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

General (9)
News (8)
Novels (6)
Fiction (1)
HEALTH (1)
LAW (1)
Science (1)
Science (1)