LexMan-ChunkerTokenizer is a tokenizer and sentence splitter tool. Marks sentence boundaries, multi-word boundaries. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.
Database with 2.253 citations extracted from the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese) and manually revised. Format: tab separated file Fields: - context number - source file id - citation
The SIMPLE Portuguese Lexicon is constituted by 10,438 entries semantically encoded, accordingly to the parole common encoding standards.
Filter by:
Portuguese (193)
English (50)
Spanish; Castilian (30)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Dutch; Flemish (10)
Galician (9)
Slovak (8)
Croatian (7)
Polish (7)
Danish (6)
Estonian (6)
Finnish (6)
Hungarian (6)
Irish (6)
Latvian (6)
Lithuanian (6)
Maltese (6)
Romanian (6)
Slovenian (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Icelandic (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)
Vietnamese (1)
1810-1940 (1)
1970 -2002 (1)
1970-1975 (1)
1970-2000 (1)
1970-2001 (1)
1970-2002 (1)
1971-1977 (1)
1974-2004 (1)
1986 -1987 (1)
1996-1997 (1)
1996-2011 (1)
2001 (1)
2003 (1)
Until 2006 (1)
Written Language (61)
Spoken Language (7)
Social Questions (15)
General (9)
News (8)
Novels (6)
Test Suite (6)
LAW (3)
INDUSTRY (2)
Political (2)
Fiction (1)
Geographic (1)
HEALTH (1)
News articles (1)
SOCIAL QUESTIONS (1)
Science (1)
TRADE (1)
Science (1)
Human Use (12)
Pos Tagging (11)
Linguistic Research (10)
Parsing (9)
Lexicon Access (7)
Lemmatization (6)
Other (4)
Annotation (2)
Summarisation (2)
Text Mining (2)
Semantic Web (1)
Speech Analysis (1)
Web Services (1)
TMX (18)
Text/xml (12)
Plain text (8)
Wav (3)
Application/pdf (2)
Xml (2)
Application/rtf (1)
Application/xml (1)
Audio/wav (1)
Sgml (1)
Text/html (1)
Text/plain (1)