Maltese Speech Engine Database

Description

Resource Type:Corpus
Media Types:Text
Audio
Language:Maltese
CINTIL-UDep

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-UPos

CINTIL-UPos is a corpus of Portuguese that is annotated with the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and Jo...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-USuite

CINTIL-USuite is a corpus of Portuguese that is annotated with lemmas, the Universal Part-of-Speech tagset (UPOS) and Universal feature bundles, related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branc...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
A Repository of State of the Art and Competitive Baseline Summaries for DUC 2004

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, bec...

Resource Type:Corpus
Media Type:Text
Language:English
Bulgarian-English Wikipedia WSD/NED corpus

Bulgarian-English Wikipedia WSD/NED corpus is composed of articles from the Bulgarian version of Wikipedia and their English counterparts.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Brands.Br – a Portuguese Reviews Corpus

The Brands.Br corpus was built from a fraction of B2W-Reviews01 corpus. We use a set of 252 samples selected by B2W to be enriched. In Brands.Br corpus we want to solve two main challenges in product reviews corpus. The first: it is very common to find customer reviews referring to distinct thing...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Multilingual corpora with coreferential annotation of person entities

Multilingual corpora with coreferential annotation of person entities ===================================================================== In-progress corpora with coreferent annotation of person entities. Sources: journals and Wikipedia. Languages: * Portuguese: varieties from Portugal, Brazi...

Resource Type:Corpus
Media Type:Text
Languages:Galician
Portuguese
Spanish; Castilian
English-Slovak parallel corpus of texts from The Ministry of Justice of the Slovak Republic (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Dataset of various English-Slovak legal texts within age...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovak
Romanian - English literature corpus (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Romanian – English literature corpus built fro...

Resource Type:Corpus
Media Type:Text
Languages:English
Romanian

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)