Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.
ixa-pipe-ned-ukb is a multilingual Named Entity Disambiguation tool. It is based on UKB (http://ixa2.si.ehu.es/ukb/), a graph-based Word Sense Disambiguation tool. The Wikipedia graph built from the hyperlinks between Wikipedia articles is used for the processing. The input of the tool is ...
Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020).
Corpus of mixed language (French, German,Luxemburguish) sentences from {sc Chamber} (House of Parliament) debate reports manually annotated at segment level with 6 labels : Lux, Fre, Ger, Lux + Fre, Lux + Ger, Lux + Fre + Ger
Bilingual (EN-PT) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (25th April 2020)
EN-PT Bilingual COVID-19-related corpus acquired from the website (https://globalvoices.org/) of GlobalVoices (28th April 2020)
Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020).
Bilingual (EN-PT) corpus acquired from the website https://antibiotic.ecdc.europa.eu/
PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...
The Complex Word (CW) Corpus contains 731 sentences each with one annotated CW. These simplifications were mined from Simple Wikipedia edit histories. Each entry gives an example of a sentence requiring simplification by means of a single lexical edit. This resource is primarily designed for t...