Summ-it

Handle:	https://hdl.handle.net/21.11129/0000-000B-D30E-8 (persistent URL to this page)
URL:	http://www.inf.pucrs.br/ontolp/downloads-ontolpplugin.php

The corpus was developed as a linguistic resource for Automatic Summarization research and his relation with different issues to engage studies on the discourse treatment.
Summ-it consists of fifty texts from Science domain extracted from Science section of Brazilian daily newspaper Folha de São Paulo (FSP), compose by:
I. Human summaries produced by experts in summarization (Coelho, 2007), rewriting the original texts in a compressed format.
II. Automatic summaries, obtained by GistSumm (Pardo et al., 2002, and Pardo et al., 2003) and SuPor-2 (Leite and Rino, 2006a, Leite and Rino, 2006c, and Leite and Rino, 2006b). All summaries were generated with a 70% compression rate, which means that the summaries correspond to roughly 30% of the original texts.
III. Manual underline sentences which contain relevant informations from the original texts (see 3.2).
IV. Texts semi-automatically annotated with morpho-syntactic informations, assisted by the syntactic parser PALAVRAS (available at: http://visl.sdu.dk/visl/pt/) and Xtractor converter (available at: http://abc.di.uevora.pt/xtractor/).
V. Texts semi-automatically annotated with co-reference informations (MMAX) and with rhetorical relations (RST) (cf. Carbonel et al., 2007, Fuchs, 2008, and Collovini et al., 2007) of noun phrases. The first process intents the identification of the entities in the discourse (e.g. noun phrases) referred or recovered in the text and, the second one, permits to structure a text by relating their discursive units through RST relations.

Download

DistributionLicence

CC - BY - NC - SA

Distribution Access/Medium: Downloadable

IPR Holder

Renata Vieira Female

[javascript protected email address]

PUCRS – Pontificia Universidade Católica do Rio Grande do Sul, Faculdade de Informática, Av. Ipiranga, 6681, Partenon

90619-900 Porto Alegre

Brasil

Tel.: +51 33203558

Fax: +51 33203758

Contact Person

Renata Vieira Female

[javascript protected email address]

PUCRS – Pontificia Universidade Católica do Rio Grande do Sul, Faculdade de Informática, Av. Ipiranga, 6681, Partenon

90619-900 Porto Alegre

Brasil

Tel.: +51 33203558

Fax: +51 33203758

text

Monolingual text corpusLanguages

Portuguese (50 Texts)

Variety: Brazilian Portuguese (Type: Other) (50 Texts)

Linguality

Linguality type: Monolingual

Text Format

text/xml (50 Texts)

Size

50 Texts

Character encoding

UTF - 8 (50 Texts)

Domains

Science (50 Texts)

Modalities

Written Language

Geographic coverage

Brazil (50 Texts)

Creation

Creation mode details: Human summaries produced by experts in summarization and automatic summaries, obtained by GistSumm and Supor-2.

Creation mode: Mixed

Creation Tools

PALAVRAS
MMAX
Xtractor converter

Metadata

Created: 07/10/2012

Last Updated: 11/28/2012

Source: METANET4U

METASHARE

Metadata Language: English

Metadata Creator

Catarina Carvalheiro

http://nlx-server.di.fc.ul.pt/~catarina/

University of Lisbon, Faculty of Sciences

FCUL

Researcher

[javascript protected email address]

Departamento de Informática NLX - Grupo de Fala e Linguagem Natural, Faculdade de Ciências da Universidade de Lisboa, Edifício C6

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Version

Version: 1

Last Updated: 07/10/2012

Usage

Foreseen UseNlp Applications

Use NLP Specific: Discourse Analysis, Summarisation

Actual Use - Nlp Applications

Use NLP Specific: Discourse Analysis, Summarisation

Documentation

Tool Documentation: Online

Samples Location: http://194.117.45.196:2000/Summ-itsample.zip

Document Type: Other

Catarina Carvalheiro, Summ-it Narrative Description, http://portulanclarin.net/repository/extradocs/Summit.pdf

Document Type: Article

Collovini, S., Carbonel, T., Fuchs, J. t., Coelho, J. C., Rino, L., and Vieira, R., "Summ-it: Um corpus anotado com informações discursivas visando à sumarização auomática", http://www.inf.pucrs.br/ontolp/downloads-ontolpplugin.php , 5.º Workshop em Tecnologias da Informação e da Linguagem Humana , 2007

Editor: TIL'2007

Publisher: TIL'2007

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following: