RedditPT Dataset

This dataset is a collection of dialogues extracted from the Portugal subreddit with RDET (Reddit Dataset Extraction Tool). It is composed of around 58,964,715 tokens in 218,550 dialogues.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Hesita-POS

Hesita-POS is an annotaded corpus. Tv News.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORP-ORAL

CORP-ORAL is a spontaneous speech corpus for European Portuguese. It is the main output of two R&D projects: CORP-ORAL and ORAL-PHON. The data consist of unscripted and unprompted face-to-face dialogues between family, friends, colleagues and unacquainted participants. All recordings are orthogra...

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
EmoVoicePort

EmoVoicePort, Emotional Vocalization Corpus (see Lima, Castro, & Scott, 2013) is a validated set of nonverbal vocalizations that portray four positive emotions (achievement/triumph, amusement, sensual pleasure, relief) and four negative ones (anger, disgust, fear, sadness). The vocalizations (n =...

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
EmoProsodyPort

EmoProsodyPort (see Castro & Lima, 2010) is a speech database with 368 short sentences and pseudosentences with neutral emotional content. Acoustic measurements and behavioral data.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Porlex

Porlex (Gomes & Castro, 2003) is a lexical database that includes written and phonetic transcription of standard adult vocabulary - 44 psycholinguistic characteristics (e.g. orthographic, phonological, phonetic, part-of-speech, and neighborhood characteristics). For each word it contains psychol...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Albertina PT-PT base

Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Portuguese Parliamentary Corpus 4.0

The Portuguese Parliamentary Corpus is part of the Mutlilingual ParlaMint Corpus, a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions. The Portuguese corpus (ParlaMint-PT) comprehends transcripts of sessions in the time pe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PsychAnaphora - Event related brain potentials from young and older adults

This set of materials pertains to a study on the processing of explicit pronouns in European Portuguese. Forty spreadsheets containing Event Related Potentials, encoded as voltage variations across 64 electrodes during 1.5 s, in two millisecond steps, are provided, 20 of which pertain to younger ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
CorPop: a corpus of popular Brazilian Portuguese

This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related discip...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (444)
Audio (18)
Image (1)