A Repository of State of the Art and Competitive Baseline Summaries for DUC 2004

Handle:	https://hdl.handle.net/21.11129/0000-000D-FE86-E (persistent URL to this page)
URL:	http://lrec2014.lrec-conf.org/sharedlrs2014/LREC_1093_res_1.gz

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, because systems have been evaluated on different datasets, with different evaluation measures, against different sets of comparison systems. Here we present a corpus of summaries produced by several state-of-the-art extractive summarization systems or by popular baseline systems. The inputs come from the 2004 DUC evaluation, the latest year in which generic summarization was addressed in a shared task. We use the same settings for ROUGE automatic evaluation to compare the systems directly and analyze the statistical significance of the differences in performance. We show that in terms of average scores the state-of-the-art systems appear similar but that in fact they produce very different summaries. Our corpus will facilitate future research on generic summarization and motivates the need for development of more sensitive evaluation measures and for approaches to system combination in summarization.

Download

DistributionLicence

CC - BY

Restrictions: Attribution

Download location: hidden

Distribution Access/Medium: Downloadable

Contact Person

Kai Hong

[javascript protected email address]

text

Monolingual text corpusLanguages

English

Linguality

Linguality type: Monolingual

Size

225 Kb

Modalities

Written Language

Metadata

Created: 09/22/2020

Last Updated: 11/19/2020

Metadata Creator

Sara Grilo

[javascript protected email address]

University of Lisbon, Faculty of Sciences FCUL Sala 6.3.32, Edifício C6, Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa

1749-016 Lisboa

Campo Grande

Portugal

Usage

Actual Use - Nlp Applications

Use NLP Specific: Summarisation

Documentation

Document Type: In Proceedings

Kai Hong and John Conroy and Benoit Favre and Alex Kulesza and Hui Lin and Ani Nenkova, A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization, , 2014

Editor: Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis

Publisher: European Language Resources Association (ELRA)

Book Title: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

ISBN: 978-2-9517408-8-4

Document Language: English

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following:

Priberam Compressive Summarization Corpus