Portuguese Parliamentary Corpus 4.0

The Portuguese Parliamentary Corpus is part of the Mutlilingual ParlaMint Corpus, a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions. The Portuguese corpus (ParlaMint-PT) comprehends transcripts of sessions in the time pe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Portuguese Parish Memories (1758)

«The Memórias Paroquiais (Parish Memories) are an essential source for obtaining a radiography of Portugal in 1758-1761. They correspond to a survey, organized in 3 major parts (the locality itself, the mountain and the river), which was printed and sent to those responsible for the dioceses of t...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Porttinari – PORTuguese Treebank

Porttinari-base (Duran et al., 2023) is the journalistic portion of Porttinari (which stands for “PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the "Universal Dependencies" international grammar framework (de Marneffe et al., 2021...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Perfil Sociolinguístico da Fala Bracarense - POS

Perfil Sociolinguístico da Fala Bracarense - POS is a manually verified part-of-speech annotation of the EXMARaLDA transcriptions in "Perfil Sociolinguístico da Fala Bracarense", a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA f...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Perfil Sociolinguístico da Fala Bracarense

Perfil Sociolinguístico da Fala Bracarense is a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA format. The corpus is composed by 1h interviews with speakers of the same area (around Braga, Portugal), stratified according to sex,...

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
NPChunks

The NPChunks training corpus contains approximately 1,000 sentences, in a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus (Barreto et al, 2006). The CINTIL corpus is a linguistically interpreted corpus of Portuguese composed of 1 Million annotated tokens from ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Nexing Corpus

Corpus with the transcriptions of syllogistic reasoning protocols. Written transcriptions: Verbal data (30 hours) elicited during an experiment on syllogistic reasoning (each of 27 participants x the 64 syllogistic problems): Thinking aloud task; reflexive conversation Performance data: La...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
News corpus categorised

The News corpus developed by LIACC in JSON format was complemented with POS and keyword topics annotation. POS-tagging =========== The POS-tagging used the tagger described in Généreux et al. (2012) The title and text body were extracted, tokenized and pos-tagged. Two new fields were added...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

News (8)
Novels (6)
General (5)
Fiction (1)
Science (1)
Science (1)