Albertina PT-BR No-brWaC

Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Albertina PT-PT

Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It has different versions that were tra...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
RedditPT Dataset

This dataset is a collection of dialogues extracted from the Portugal subreddit with RDET (Reddit Dataset Extraction Tool). It is composed of around 58,964,715 tokens in 218,550 dialogues.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORP-ORAL

CORP-ORAL is a spontaneous speech corpus for European Portuguese. It is the main output of two R&D projects: CORP-ORAL and ORAL-PHON. The data consist of unscripted and unprompted face-to-face dialogues between family, friends, colleagues and unacquainted participants. All recordings are orthogra...

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
EmoProsodyPort

EmoProsodyPort (see Castro & Lima, 2010) is a speech database with 368 short sentences and pseudosentences with neutral emotional content. Acoustic measurements and behavioral data.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Albertina PT-PT base

Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Portuguese Parliamentary Corpus 4.0

The Portuguese Parliamentary Corpus is part of the Mutlilingual ParlaMint Corpus, a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions. The Portuguese corpus (ParlaMint-PT) comprehends transcripts of sessions in the time pe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PsychAnaphora - Event related brain potentials from young and older adults

This set of materials pertains to a study on the processing of explicit pronouns in European Portuguese. Forty spreadsheets containing Event Related Potentials, encoded as voltage variations across 64 electrodes during 1.5 s, in two millisecond steps, are provided, 20 of which pertain to younger ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
CorPop: a corpus of popular Brazilian Portuguese

This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related discip...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (445)
Audio (18)
Image (1)