This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...
Wordlist for spell-checking
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
Filter by:
Human Use (2)
Lemmatization (2)
Pos Tagging (2)
Text Mining (2)
Annotation (1)
Event Extraction (1)
Lexicon Access (1)
Other (1)
Parsing (1)
Speech Analysis (1)
Plain text (13)
Wav (1)