108 WAV files of spoken Maltese newspaper texts, subdivided into 12 directories with a variable number of sentences (sometimes: clauses) each. They come together with transcriptions and tables of phoneme durations.
The full editions of ILLUM from 12/11/2006 to 30/05/2010 (185 issues).
Description
CORDIAL-SIN is a corpus of spoken dialectal European Portuguese developed at Centro de Linguística da Universidade de Lisboa (CLUL). The materials for this corpus were drawn from the recordings of dialect speech collected by the CLUL ATLAS team as fieldwork interviews for linguistic atlases betwe...
This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...
LX-AP was created from the translation of Almuhareb-Poesio (ap) benchmark (Almuhareb and Poesio, 2005). The original data set was created considering three aspects: POS, frequency and ambiguity. It contains 402 names from 21 categories of WordNet, with 13 to 21 names from each one of those categ...
This dataset is a collection of dialogues extracted from the Portugal subreddit with RDET (Reddit Dataset Extraction Tool). It is composed of around 58,964,715 tokens in 218,550 dialogues.
CORP-ORAL is a spontaneous speech corpus for European Portuguese. It is the main output of two R&D projects: CORP-ORAL and ORAL-PHON. The data consist of unscripted and unprompted face-to-face dialogues between family, friends, colleagues and unacquainted participants. All recordings are orthogra...