Monolingual text corpus Languages
(318,593 Tokens)
Linguality Linguality type: Monolingual
Text Format
Plain text
(318,593 Tokens)
Size Character encoding
UTF - 8
(318,593 Tokens)
Domains Modalities Classification
(62,116 Tokens)
Text type: Formal/Media
Text genre: Spoken
(32,646 Tokens)
Text type: Informal/Public
Text genre: Spoken
(66,274 Tokens)
Text type: Formal/Natural Context
Text genre: Spoken
(133,192 Tokens)
Text type: Informal/Private
Text genre: Spoken
(24,365 Tokens)
Text type: Formal/Telephone
Text genre: Spoken
Annotation Alignment Segmentation level: Utterance
Format: text/xml
Annotation Mode: Manual
Morphosyntactic Annotation - Pos Tagging Segmentation level: Word
Format: text/plain
Annotation Mode: Automatic
Time Coverage
(318,593 Tokens)
Geographic coverage
(318,593 Tokens)
Creation Creation mode: Mixed
Monolingual audio corpus Languages
(30 Hours)
Linguality Linguality type: Monolingual
Size Effective speech duration
30 Hours
Audio duration
30 Hours
Domains Modalities Classification
(30 Hours)
Audio genre: Speech
Speech genre: Conversation
Conformance to classification scheme: ANC_domain Classification
Annotation Speech Annotation - Sound To Text Alignment Segmentation level: Utterance
Format: text/xml
Annotation Mode: Manual
Annotation Manual:
Morphosyntactic Annotation - Pos Tagging Annotated elements: Discourse Markers, Mispronunciations, Other, Speaker Noise, Truncation
Segmentation level: Word
Format: text/plain
Annotation Mode: Automatic
Speech Annotation - Orthographic Transcription Annotated elements: Discourse Markers, Mispronunciations, Other, Speaker Noise, Truncation
Segmentation level: Word
Format: text/plain
Annotation Mode: Manual
Content Speech items: Free Speech
Non-speech items: Other
Noise Level: Low
Setting Naturality: Spontaneous
Conversational type: Multilogue
Scenario: Other
Audience: No
Interactivity: Interactive
Audio Formats wav Number of tracks: 2
Recording quality: High
Quantization: 16
Time Coverage
Geographic coverage
(30 Hours)
Recording Recording environment: Closed Public Place, Conference Room, Lecture Room, Office, Open Public Place, Other
Recording device type: DAT, Other
Source channel: Other, Radio, Telephone, Tv
Capture Capturing device type: Close Talk Microphone, Microphone, Studio Equipment, Telephone Fixed
Person SourceSet Origin of persons: Native
Age of persons: Adult
Sex of persons: Mixed
Age range end: 71
Hearing impairment of persons: No
Age range start: 19
Speaking impairment of persons: No
Geographic distribution of persons: Portugal