Albertina PT-PT base

Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Grafone-Tool

Grafone-Tool is a tool for conversion from grapheme to phoneme for European Portuguese. The converter works with the Portuguese spelling, both prior to and after the Orthographic Agreement of 1990.

Resource Type:Tool / Service
Language:Portuguese
Perfil Sociolinguístico da Fala Bracarense

Perfil Sociolinguístico da Fala Bracarense is a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA format. The corpus is composed by 1h interviews with speakers of the same area (around Braga, Portugal), stratified according to sex,...

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
HESITA database

The HESITA database is a corpus consisting of television daily news collected over a month and was annotated regarding to hesitation events, acoustical environments, speaking styles, speaker characteristics and respiratory events, among other characteristic sounds.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Portuguese Parish Memories (1758)

«The Memórias Paroquiais (Parish Memories) are an essential source for obtaining a radiography of Portugal in 1758-1761. They correspond to a survey, organized in 3 major parts (the locality itself, the mountain and the river), which was printed and sent to those responsible for the dioceses of t...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Archivo dos Açores, dir. Ernesto do Canto, 1.ª série, Ponta Delgada, Vol. 1-12

A publicação Arquivo dos Açores, consagrada como obra de referência para a investigação histórica sobre o arquipélago dos Açores, conta com duas séries, num total de 20 volumes. A primeira série do Arquivo dos Açores, composta por 15 volumes, decorreu entre 1878 e 1959, com grandes interrupções r...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Arquivo Dialetal CLUP - Orthographic and phonetic transcription

Arquivo Dialetal CLUP - ORTH is a speech corpus approximately with 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic and phonetic transcription.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Summ-it

The corpus was developed as a linguistic resource for Automatic Summarization research and his relation with different issues to engage studies on the discourse treatment. Summ-it consists of fifty texts from Science domain extracted from Science section of Brazilian daily newspaper Folha de Sã...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BERTimbau - Portuguese BERT-Base language model

This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Base cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow and...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Corpus Desenvolvimento da Escrita no Ensino Básico

The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)