Malta Government Gazette (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual gazette (English-Maltese) of the government of...

Resource Type:Corpus
Media Type:Text
Languages:English
Maltese
Portuguese Parish Memories (1758)

«The Memórias Paroquiais (Parish Memories) are an essential source for obtaining a radiography of Portugal in 1758-1761. They correspond to a survey, organized in 3 major parts (the locality itself, the mountain and the river), which was printed and sent to those responsible for the dioceses of t...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Perfil Sociolinguístico da Fala Bracarense

Perfil Sociolinguístico da Fala Bracarense is a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA format. The corpus is composed by 1h interviews with speakers of the same area (around Braga, Portugal), stratified according to sex,...

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Portuguese-English bilingual corpus from the Portuguese Constitution (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Complete text of the Portuguese Constitution in Portugue...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Priberam Compressive Summarization Corpus

This is a corpus for multi-document summarization for European Portuguese. It contains 80 topics, each of which has 10 documents, for a total of 800 documents. Each topic contains two human summaries. The summaries are compressive: they are the result of a compression of the sentences in the orig...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Arquivo Dialetal CLUP - Orthographic and phonetic transcription

Arquivo Dialetal CLUP - ORTH is a speech corpus approximately with 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic and phonetic transcription.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Corpus of State-related content from the Latvian Web (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Latvian Web, home pages of ministries and state public s...

Resource Type:Corpus
Media Type:Text
Languages:English
Latvian
Romanian Ombudsman archive (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel aligned corpus in tmx format built from the Rom...

Resource Type:Corpus
Media Type:Text
Languages:English
Romanian
The CIEMPIESS Proper-Names Pronouncing Dictionary

Transcriptions in the CIEMPIESS-PNPD are based on a phonetic alphabet called Mexbet. Mexbet was design for the Spanish of Central Mexico and it has several levels of granularity. The CIEMPIESS-PNPD comes in two versions: Mexbet T29 and Mexbet T66. Level T29 of Mexbet means that transcriptions ...

Resource Type:Corpus
Media Type:Text
Language:Spanish; Castilian
English Acquis Communautaire

This is the English version of the Acquis Communautaire (AC), which is the total body of European Union (EU) law applicable in the EU Member States. It consists of selected texts between the 1950s and today.

Resource Type:Corpus
Media Type:Text
Language:English

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)