The CINTIL-DepBank (Branco et al., 2011a) is a corpus of grammatical dependencies of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens) (see 3.2.). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus (cf. Section 4.6.).
The CINTIL-DepBank is aligned to a constituency bank, the CINTIL-TreeBank (see Branco et al., 2011b). The key bridging elements are the grammatical function tags decoring the nodes, in the treebank, and the arcs, in the dependencybank (see http://lxcenter.di.fc.ul.pt/services/en/LXServicesSearcher.html). This means that the CINTIL-DepBank was extended from the CINTIL-PropBank so that besides the tags for the different dependency relations, the arcs are further decorated with tags indicating the semantic relation at stake.
The main motivation behind the creation of this resource was to build a high quality data set with dependency information that could support the development of a large set of automatic resources and tools for Portuguese for NLP studies.