F_Mona_1/ Spoken Newspaper

108 WAV files of spoken Maltese newspaper texts, subdivided into 12 directories with a variable number of sentences (sometimes: clauses) each. They come together with transcriptions and tables of phoneme durations.

Resource Type:Corpus
Media Type:Audio
Language:Maltese
Bilingual collection of documents about the Cyprus Problem (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A parallel corpus(Greek-English) regarding the Cyprus Pr...

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)
GREC

GREC is a semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) which is intended for training IE systems and/or resources which are used to extract events from biomedical literature.

Resource Type:Corpus
Media Type:Text
Language:English
Maltese Wikipedia

This corpus is part of the collection of the Wikipedia Dumps which was retrieved from wikipedia.org on April 8, 2010. It comes with two individual XML files, one containing the Wikipedia articles and another containing the metadata about it.

Resource Type:Corpus
Media Type:Text
Language:Maltese
Bilingual documents Bulgarian-English in the field of open data, broadband and information society (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Bulgarian collection in the field of open data, ...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Bilingual Bulgarian-English corpus from the National Revenue Agency (BG) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Bulgarian-English corpus of administrative doc...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Arquivo Dialetal CLUP - Áudio

Arquivo Dialetal CLUP - Áudio is an audio corpus of spontaneous speech, mainly from Northern Portugal.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
CRPC-Quotations

Database with 2.253 citations extracted from the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese) and manually revised. Format: tab separated file Fields: - context number - source file id - citation

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Parallel corpus (en-pl) from the Export Promotion Portal of Poland (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A paralell corpus constructed from data acquired form th...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
Financial Stability Reports from the National Bank of Poland (2015-16) (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Financial Stability Reports from the National Bank of Po...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)