TinySVM is an implementation of Support Vector Machines (SVMs) (Vapnik, 1995; Vapnik, 1998) for the problem of pattern recognition.
The CINTIL-WordSenses corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 23,825 sentences of written Portuguese with open-class terms manually disambiguated and annotated with synset identifiers from the Portuguese MultiWordNet (MWNPT) (Pianti ...
Porlex (Gomes & Castro, 2003) is a lexical database that includes written and phonetic transcription of standard adult vocabulary - 44 psycholinguistic characteristics (e.g. orthographic, phonological, phonetic, part-of-speech, and neighborhood characteristics). For each word it contains psychol...
SENTER is a SENtence splitTER for Portuguese.
Portulex is a lexical database in European Portuguese that contains words from reading texts in children’s schoolbooks for reading and language instruction in Grades 1 to 4. It comprises a wordform and a lemma database. The wordform database consists of 17,062 inflected wordforms, and the lemma d...
ACOPOST is a free and open source collection of four part-of-speech taggers (t3, met, tbt, and et). In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up the words in a text (corpus) as co...