CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects – TreeBank


The CORDIAL-SIN–TreeBank is a collection of 177596 syntactic parse trees of the Syntax-oriented Corpus of Portuguese Dialects. CORDIAL-SIN is a corpus of spoken dialectal European Portuguese, developed at Centro de Linguística da Universidade de Lisboa, that compiles excerpts of spontaneous and semi-directed speech selected from fieldwork interviews carried out in 42 locations within the Portuguese territory (see also CORDIAL-SIN in this repository).

CORDIAL-SIN syntactic annotation follows the constituency-based system originally developed for the Penn Parsed Corpora of Historical English and it provides the marking up of constituent boundaries, phrase and clause dependencies, categorial information, grammatical relations, discoursive functions, sentence and clause types, some null constituents and certain transformational relations. The annotation guidelines for Portuguese were established in close cooperation with the Tycho Brahe Parsed Corpus of Historical Portuguese team (cf. Portuguese syntactic annotation manual).

CORDIAL-SIN syntactic annotation uses labeled bracketing representations and it is fully searchable with search engines as CorpusSearch.


