Portulex is a lexical database in European Portuguese that contains words from reading texts in children’s schoolbooks for reading and language instruction in Grades 1 to 4. It comprises a wordform and a lemma database. The wordform database consists of 17,062 inflected wordforms, and the lemma d...
Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
This dataset is a collection of dialogues extracted from the Portugal subreddit with RDET (Reddit Dataset Extraction Tool). It is composed of around 58,964,715 tokens in 218,550 dialogues.
The DepBankPT (Branco et al., 2011a) is a corpus of grammatical dependencies of the translated news composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The DepBankPT is aligned to a constituency bank, the TreeBankPT (see Branco et al., 2011b). The key bridging eleme...
The LogicalFormBankPT (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of translated texts composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The LogicalFormBankPT is composed of MRS representations of each sentence’s semantic relation...
ixa-pipe-ned-ukb is a multilingual Named Entity Disambiguation tool. It is based on UKB (http://ixa2.si.ehu.es/ukb/), a graph-based Word Sense Disambiguation tool. The Wikipedia graph built from the hyperlinks between Wikipedia articles is used for the processing. The input of the tool is ...
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English translations of German BMI brochures from the la...
ixa-pipe-dep-eu is a Basque dependency parsing tool. It is based on MATE-tools. This tool takes a document in Natural Language Processing Annotation Format (NAF) format (http://wordpress.let.vupr.nl/naf/) and outputs a new NAF document. This tool is partly funded by the European Commission ...
This is the English version of the Acquis Communautaire (AC), which is the total body of European Union (EU) law applicable in the EU Member States. It consists of selected texts between the 1950s and today.