CINTIL-TreeBank

Handle:	https://hdl.handle.net/21.11129/0000-000B-D2FE-A (persistent URL to this page)

The CINTIL-TreeBank (Branco et al., 2011) is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) used for regression testing of the computational grammar that supported the annotation of the corpus.
For the creation of this TreeBank we adopted a semi-automatic analysis with a double-blind annotation followed by adjudication. The resulting dataset contains one information level: phrase constituency.
The main motivation behind the creation of this resource was to build a high quality data set with syntactic information that could support the development of a large set of automatic resources and tools for Portuguese for NLP studies.
You may also be interested in the related resources CINTIL-DeepBank, CINTIL-DependencyBank, CINTIL-PropBank and CINTIL-LogicalFormBank, also available from this repository.

Download

DistributionLicence

MS - NC - No ReD - ND

Licensors:

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Distribution rights holders:

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

IPR Holder

University of Lisbon, Faculty of Sciences

[javascript protected email address]

Contact Person

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

text

Monolingual text corpusLanguages

Portuguese (10,140 Sentences)

Linguality

Linguality type: Monolingual

Text Format

text/xml (10,140 Sentences)

Size

110,166 Tokens

10,039 Sentences

Character encoding

UTF - 8 (10,140 Sentences)

Domains

Novels (403 Sentences)

News (8,952 Sentences)

Test Suite (785 Sentences)

Modalities

Written Language

Geographic coverage

Portugal (10,140 Sentences)

Creation

Creation mode: Mixed

Resource Creation

Resource Creator

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Funding Project

SemanticShare - Resources and Tools for Semantic Processing (SemanticShare - FCT/PTDC/PLP/81157/2006)

URL: http://nlx.di.fc.ul.pt/projects.html

Funding Type: National Funds

Funder: FCT - Fundação para a Ciência e Tecnologia

Funding Country: Portugal

Project duration: 06/01/2006 - 12/31/2010

Metadata

Created: 06/01/2012

Last Updated: 01/07/2021

Source: META-SHARE

METANET4U

Metadata Language: english

Metadata Creator

Catarina Carvalheiro

http://nlx-server.di.fc.ul.pt/~catarina/

University of Lisbon, Faculty of Sciences

FCUL

Researcher

[javascript protected email address]

Departamento de Informática NLX - Grupo de Fala e Linguagem Natural, Faculdade de Ciências da Universidade de Lisboa, Edifício C6

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Version

Version: 1

Last Updated: 06/01/2012

Documentation

Tool Documentation: Online

Samples Location: https://portulanclarin.net/repository/extradocs/treebanksample.txt

Document Type: Other

Catarina Carvalheiro, CINTIL Treebank Narrative Description., http://portulanclarin.net/repository/extradocs/CINTIL-Treebank.pdf , 2012

Document Type: Tech Report

António, Branco; João, Silva; Francisco, Costa; Sérgio, Castro, CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency, http://docs.di.fc.ul.pt/jspui/handle/10455/6746 , 2011

Publisher: Department of Informatics, University of Lisbon

Document Language: english

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following:

Resources from the same project

Resources from the same creators