This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.
The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...
Filter by:
Lemmatization (2)
Human Use (1)
Pos Tagging (2)
Annotation (1)
Event Extraction (1)
Parsing (1)
Speech Analysis (1)
Text Mining (1)
Corpus (1)
Plain text (1)
Wav (1)