Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.
Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.
The AuCoPro-Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes. The dataset consists of two files, one for Afrikaans and one for Dutch. The annotation was performed according to annotation guidelines as described in Verhoeven, van Zaanen, van Huyss...
Filter by:
Dutch; Flemish (13)
Bulgarian (12)
English (11)
German (11)
Portuguese (10)
Spanish; Castilian (10)
Czech (9)
French (8)
Italian (8)
Polish (8)
Latvian (7)
Romanian (7)
Croatian (6)
Danish (6)
Estonian (6)
Finnish (6)
Hungarian (6)
Irish (6)
Lithuanian (6)
Maltese (6)
Slovak (6)
Slovenian (6)
Swedish (6)
Basque (4)
Afrikaans (1)
Arabic (1)
Chinese (1)
Hindi (1)
Icelandic (1)
Russian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)
Vietnamese (1)
Written Language (6)
Multilingual (13)