SIMPLE Portuguese Lexicon

The SIMPLE Portuguese Lexicon is constituted by 10,438 entries semantically encoded, accordingly to the parole common encoding standards.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
DiLAF African languages-French dictionaries

Bilingual dictionaries encoded in XML - Hausa-French dict. for basic cycle, 2008 Soutéba: 7,823 entries; - Kanuri-French dict. for basic cycle, 2004 Soutéba: 5,994 entries; - Tamajaq-French dict. for basic cycle, 2007 Soutéba: 5,205 entries; - Songhai-zarma-French dict. for basic cycle, 2007 Sout...

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Central Kanuri
Hausa
Songhai languages
Tamashek
YamCha: Yet Another Multipurpose CHunk Annotator

YamCha is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. We used it for NP chunking.

Resource Type:Tool / Service
Maltese Acquis Communautaire

This is the Maltese version of the Acquis Communautaire (AC), which is the total body of European Union (EU) law applicable in the EU Member States. It consists of selected texts between the 1950s and today, translated to Maltese.

Resource Type:Corpus
Media Type:Text
Language:Maltese
LexMan-ChunkerTokenizer

LexMan-ChunkerTokenizer is a tokenizer and sentence splitter tool. Marks sentence boundaries, multi-word boundaries. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese
QTLeap LRT-M31-WP4

Treebanks and semantic lexicons for Basque, Bulgarian, Dutch, German and Portuguese. Created within European project QTLeap.

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Dutch; Flemish
German
Maltese Speech Engine Lexicon

This lexicon is a speech lexicon, exported from Crimsonwing’s text-to-speech (TTS) database into a .txt file. In its original form and together with the Maltese Speech Engine Diphone repository, it was used for building Crimsonwing’s text-to-speech system. The file is in txt format, with each ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Maltese
PTPARL Corpus

The PTPARL Corpus contains approximately 975,806 running words of European Portuguese. It includes 1076 texts consisting of adapted transcriptions of the Portuguese parliament sessions, which were made available in 2004.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
UIMA/U-Compare OpenNLP Sentence Detector

This is a UIMA wrapper for the OpenNLP Sentence Detector tool. It splits English text into individual sentences. The tool forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. ...

Resource Type:Tool / Service
Language:English
Europarl-QTLeap WSD/NED corpus

The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)