The Wixarika-Spanish Parallel Corpus

Wixarika is an indigenous language spoken in central west Mexico by approximately fifty thousand people. For indigenous languages like Wixarika, there is a lack of digital resources in general since native speakers do not necessarily generate a digital fingerprint on public forums. The lack of...

Resource Type:Corpus
Media Type:Text
Language:Spanish; Castilian
The CIEMPIESS Proper-Names Pronouncing Dictionary

Transcriptions in the CIEMPIESS-PNPD are based on a phonetic alphabet called Mexbet. Mexbet was design for the Spanish of Central Mexico and it has several levels of granularity. The CIEMPIESS-PNPD comes in two versions: Mexbet T29 and Mexbet T66. Level T29 of Mexbet means that transcriptions ...

Resource Type:Corpus
Media Type:Text
Language:Spanish; Castilian
PS corpus (Post-Scriptum)-ES

PS Corpus (Post-Scriptum)-ES is a corpus of 2368 informal mail letters written in Spanish during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speech a...

Resource Type:Corpus
Media Type:Text
Language:Spanish; Castilian
Alignment of Parallel Texts from Cyrillic to Latin

The text of the novel Sania (eng. The Sledge) served as a training corpus. It was written in 1955 by Ion Druță and printed originally in Cyrillic scripts. We have followed a special previously developed technology of recognition and specialized lexicons. In such a way, we have obtained the electr...

Resource Type:Corpus
Media Type:Text
Language:Romanian
PS corpus (Post-Scriptum) - treebank

PS corpus (Post-Scriptum) - treebank is a treebank corpus of 586 informal mail letters written in Portuguese and Spanish during the Modern Ages (from the XVIth century to the beginning of the XIXth century). This treebank is a syntactically annotated subset of the Portuguese "PS corpus (Post-S...

Resource Type:Corpus
Media Type:Text
Languages:Portuguese
Spanish; Castilian
Inquirições reais

Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica).

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Nexing Corpus

Corpus with the transcriptions of syllogistic reasoning protocols. Written transcriptions: Verbal data (30 hours) elicited during an experiment on syllogistic reasoning (each of 27 participants x the 64 syllogistic problems): Thinking aloud task; reflexive conversation Performance data: La...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-UDep

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

English (177)
Portuguese (111)
Polish (37)
German (29)
Bulgarian (28)
French (25)
Czech (19)
Romanian (19)
Croatian (18)
Maltese (16)
Estonian (15)
Swedish (15)
Finnish (14)
Italian (13)
Latvian (13)
Danish (11)
Basque (10)
Slovak (10)
Irish (9)
Arabic (3)
Chinese (3)
Dutch (1)
Hindi (1)
Russian (1)
Slovene (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)