Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Letter of rights for persons arrested on the basis of a ...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Dutch; Flemish
English
French
German
Greek, Modern (1453-)
Italian
Latvian
Polish
Romanian
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-Rare Word Similarity Dataset

The LX-Rare Word Similarity Data set was created from Stanford Rare Word (RW) Similarity data set (Luong et al., 2013). This list contains 2 034 words (1 017 pairs of words). All the words were extracted from Wikipedia and from WordNet (Miller, 1995), a lexical database where the concepts are gro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P. (Part I)

BDCamões Corpus is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a time span from the 15th to the 21st century, and adhering to different orthographic conve...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Maltese Speech Engine Database

Description

Resource Type:Corpus
Media Types:Text
Audio
Language:Maltese
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovak corpus of annual reports from the Slovak ...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovak
ParaCrawl release 7 Portuguese-English

Portuguese-English parallel from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice....

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Multilingual corpora with coreferential annotation of person entities

Multilingual corpora with coreferential annotation of person entities ===================================================================== In-progress corpora with coreferent annotation of person entities. Sources: journals and Wikipedia. Languages: * Portuguese: varieties from Portugal, Brazi...

Resource Type:Corpus
Media Type:Text
Languages:Galician
Portuguese
Spanish; Castilian
A Repository of State of the Art and Competitive Baseline Summaries for DUC 2004

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, bec...

Resource Type:Corpus
Media Type:Text
Language:English
Bulgarian-English Wikipedia WSD/NED corpus

Bulgarian-English Wikipedia WSD/NED corpus is composed of articles from the Bulgarian version of Wikipedia and their English counterparts.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)