TinySVM is an implementation of Support Vector Machines (SVMs) (Vapnik, 1995; Vapnik, 1998) for the problem of pattern recognition.
A Portuguese as a non-native language learners' corpus of written texts with three independent subcorpora: - Portuguese as a Foreign Language: Subcorpus Português Língua Estrangeira (PEAPL2_PLE) http://teitok2.iltec.pt/peapl2-ple/index.php?action=home - East Timorese Portuguese: Subcorpus T...
This set of materials pertains to a study on the production of explicit pronouns, null pronouns, and repeated-NP anaphors, in European Portuguese. A spreadsheet containing data from 73 participants (young adults), namely, count data for instances of the different types of anaphor that occurred in...
CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.
MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).
RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Forbruker Europa is the Norwegian office of the European...
A computational lexicon for Portuguese that provides mappings between verbs and their nominalizations.