The resource consists of a Portuguese frequency lexicon based on a 16 million words corpus of written and spoken texts from different genres. The lexicon contains 26.443 entries (lemma) and 140
The AuCoPro-Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes. The dataset consists of two files, one for Afrikaans and one for Dutch. The annotation was performed according to annotation guidelines as described in Verhoeven, van Zaanen, van Huyss...
Bilingual dictionaries encoded in XML - Hausa-French dict. for basic cycle, 2008 Soutéba: 7,823 entries; - Kanuri-French dict. for basic cycle, 2004 Soutéba: 5,994 entries; - Tamajaq-French dict. for basic cycle, 2007 Soutéba: 5,205 entries; - Songhai-zarma-French dict. for basic cycle, 2007 Sout...
French-Khmer pivot lexical database
We are creating a large scale, freely available, semantic dictionary of Mandarin Chinese: the Chinese Open Wordnet, inspired by the Princeton WordNet and the Global WordNet Grid. All relations (hypernyms, meronyms ...) come from Princeton WordNet 3.0. We have enriched the synsets with Chinese lex...
This resource is part of Deliverable 5.7 of the European Comission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). This gazetteer comprises multilingual lexicon entries used for the translation of specific IT domain expressions for Basque, Bulgarian, Czech, Dutch, Engli...
This resource comprises multilingual lexicon entries used for the translation of specific IT domain expressions. This gazetteer has been collected from four different sources: VLC, LibreOffice and KDE localization projects and IT domain Wikipedia articles.
The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...
This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...
DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.