Porttinari – PORTuguese Treebank

Porttinari-base (Duran et al., 2023) is the journalistic portion of Porttinari (which stands for “PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the "Universal Dependencies" international grammar framework (de Marneffe et al., 2021...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-Corpus Internacional do Português

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expres...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PropBankPT

The PropBankPT (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal translated. For the creation of this PropBank we adopted a semi-automatic analysis with...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Central Statistical Office Dataset (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Two Polish-English publications of the Polish Central St...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
Public Procurement Dataset 2 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
Bilingual Croatian-English Parallel Corpus (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Croatian-English Parallel Corpus of 21340 tran...

Resource Type:Corpus
Media Type:Text
Languages:Croatian
English
DependencyBankPT

The DepBankPT (Branco et al., 2011a) is a corpus of grammatical dependencies of the translated news composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The DepBankPT is aligned to a constituency bank, the TreeBankPT (see Branco et al., 2011b). The key bridging eleme...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LogicalFormBankPT

The LogicalFormBankPT (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of translated texts composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The LogicalFormBankPT is composed of MRS representations of each sentence’s semantic relation...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
English-Slovak corpus of annual reports on immigration and asylum policies from the EMN National Contact Point for the Slovak Republic website (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovak corpus of annual reports on immigration a...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovak
English-Estonian Parallel corpus compiled from translated annual reports from Estonian Academy of Sciences  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Estonian translated annual reports as source dat...

Resource Type:Corpus
Media Type:Text
Languages:English
Estonian

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)