Porttinari – PORTuguese Treebank

Porttinari-base (Duran et al., 2023) is the journalistic portion of Porttinari (which stands for “PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the "Universal Dependencies" international grammar framework (de Marneffe et al., 2021...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Lince - Conversor para a Nova Ortografia

Lince is a multi-platform stand-alone application that updates the textual contents of documents in a range of popular formats to the spelling prescribed by the 1990 Portuguese language reform. It works with both previously existing Portuguese language orthographic standards (1943, previously val...

Resource Type:Tool / Service
Language:Portuguese
Corpus Desenvolvimento da Escrita no Ensino Básico

The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LexMan-ChunkerTokenizer

LexMan-ChunkerTokenizer is a tokenizer and sentence splitter tool. Marks sentence boundaries, multi-word boundaries. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese
Arquivo Dialetal CLUP - POS

Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
C-ORAL-ROM_EXM

This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Fundamental Portuguese

This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
EmoVoicePort

EmoVoicePort, Emotional Vocalization Corpus (see Lima, Castro, & Scott, 2013) is a validated set of nonverbal vocalizations that portray four positive emotions (achievement/triumph, amusement, sensual pleasure, relief) and four negative ones (anger, disgust, fear, sadness). The vocalizations (n =...

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Arquivo Dialetal CLUP - Orthographic and phonetic transcription

Arquivo Dialetal CLUP - ORTH is a speech corpus approximately with 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic and phonetic transcription.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
EmoProsodyPort

EmoProsodyPort (see Castro & Lima, 2010) is a speech database with 368 short sentences and pseudosentences with neutral emotional content. Acoustic measurements and behavioral data.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese

Order by:

Filter by:

Portuguese (193)
English (50)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Slovak (8)
Polish (7)
Danish (6)
Finnish (6)
Irish (6)
Latvian (6)
Maltese (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)