The AuCoPro-Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes. The dataset consists of two files, one for Afrikaans and one for Dutch. The annotation was performed according to annotation guidelines as described in Verhoeven, van Zaanen, van Huyss...
We are creating a large scale, freely available, semantic dictionary of Mandarin Chinese: the Chinese Open Wordnet, inspired by the Princeton WordNet and the Global WordNet Grid. All relations (hypernyms, meronyms ...) come from Princeton WordNet 3.0. We have enriched the synsets with Chinese lex...
Bilingual dictionaries encoded in XML - Hausa-French dict. for basic cycle, 2008 Soutéba: 7,823 entries; - Kanuri-French dict. for basic cycle, 2004 Soutéba: 5,994 entries; - Tamajaq-French dict. for basic cycle, 2007 Soutéba: 5,205 entries; - Songhai-zarma-French dict. for basic cycle, 2007 Sout...
DVPM-EtyMor is a lexical database. Etymological, morphological and textual exemplification. Around 3000 verbs. Language: Medieval portuguese.
DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.
EMOTAIX.PT (Costa, 2012) is a database of 3,983 emotional words (nouns, verbs, adjectives and adverbs) in European Portuguese based on the original EMOTAIX in French (Piolat & Bannour, 2009). Each word is classified into three hierarchical levels: Supra Category, Super Category and Basic Category...
The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...
This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...
This lexicon is a speech lexicon, exported from Crimsonwing’s text-to-speech (TTS) database into a .txt file. In its original form and together with the Maltese Speech Engine Diphone repository, it was used for building Crimsonwing’s text-to-speech system. The file is in txt format, with each ...
French-Khmer pivot lexical database