TreeBankPT

The TreeBankPT (Branco et al., 2011) is a corpus of syntactic constituency trees of the translated news composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal.
For the creation of this TreeBank we adopted a semi-automatic analysis with a double-blind annotation followed by adjudication. The resulting dataset contains one information level: phrase constituency.
The main motivation behind the creation of this resource was to build a high quality data set with syntactic information that could support the development of a large set of automatic resources and tools for Portuguese for NLP studies.
The development of this resource started under the METANET4U project (at: http://metanet4u.eu/) whose main goal is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.
You may also be interested in the related resources DeepBankPT, DependencyBankPT, PropBankPT and LogicalFormBankPT, also available from this repository.

Download








      People who looked at this resource also viewed the following:
      People who downloaded this resource also downloaded the following:
      Resources from the same creators