Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.
The HESITA database is a corpus consisting of television daily news collected over a month and was annotated regarding to hesitation events, acoustical environments, speaking styles, speaker characteristics and respiratory events, among other characteristic sounds.
EmoVoicePort, Emotional Vocalization Corpus (see Lima, Castro, & Scott, 2013) is a validated set of nonverbal vocalizations that portray four positive emotions (achievement/triumph, amusement, sensual pleasure, relief) and four negative ones (anger, disgust, fear, sadness). The vocalizations (n =...
Arquivo Dialetal CLUP - Áudio is an audio corpus of spontaneous speech, mainly from Northern Portugal.
Arquivo Dialetal CLUP - ORTH is a speech corpus approximately with 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic and phonetic transcription.
EmoProsodyPort (see Castro & Lima, 2010) is a speech database with 368 short sentences and pseudosentences with neutral emotional content. Acoustic measurements and behavioral data.
This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.
This resource includes a spoken Portuguese corpus exemplifying the Portuguese spoken in Portugal, Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, Macao, Goa and East-Timor - with aligned sound and orthographic transcription - collected among sociolinguistically diver...
Perfil Sociolinguístico da Fala Bracarense is a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA format. The corpus is composed by 1h interviews with speakers of the same area (around Braga, Portugal), stratified according to sex,...
This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.