PAROLE Portuguese Annotated Corpus

LE-PAROLE

Handle:	https://hdl.handle.net/21.11129/0000-000B-D315-F (persistent URL to this page)
URL:	http://www.clul.ul.pt/en/research-teams/197-le-parole

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard.
The tagged subset reproduces approximately the whole Corpus distribution by Medium (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%, Miscellaneous: ab. 10%). It has been morpho-syntactically tagged accordingly to the parole common tagset and morpho-syntactic annotation standards. Disambiguation was manually checked.
The corpus was tagged under a collaboration of two Portuguese institutions: the Centre of Linguistics of the University of Lisbon and INESC-ID.

Contact Resource Maintainer

DistributionLicence

MIT

Restrictions: Academic - Non Commercial Use

User Nature: Academic

Distribution Access/Medium: CD - ROM

Licensors:

Isabel Trancoso Female

http://www.l2f.inesc-id.pt/~imt/imt_en.html

Professor

[javascript protected email address]

INESC-ID L2f - Spoken Language Systems Lab R. Alves Redol, 9

1000-029 Lisboa

Lisboa

Portugal

Tel.: +351 213 100 268

Fax: +351 213 145 843

Amália Mendes

http://www.clul.ul.pt/en/researcher/146-amalia-mendes

Faculdade de Letras da Universidade de Lisboa

CLUL

[javascript protected email address]

Alameda da Universidade

1600-214 Lisbon

Tel.: 00351217904961

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

Distribution rights holders:

Isabel Trancoso Female

http://www.l2f.inesc-id.pt/~imt/imt_en.html

Professor

[javascript protected email address]

INESC-ID L2f - Spoken Language Systems Lab R. Alves Redol, 9

1000-029 Lisboa

Lisboa

Portugal

Tel.: +351 213 100 268

Fax: +351 213 145 843

Amália Mendes

http://www.clul.ul.pt/en/researcher/146-amalia-mendes

Faculdade de Letras da Universidade de Lisboa

CLUL

[javascript protected email address]

Alameda da Universidade

1600-214 Lisbon

Tel.: 00351217904961

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

IPR Holder

Isabel Trancoso Female

http://www.l2f.inesc-id.pt/~imt/imt_en.html

Professor

[javascript protected email address]

INESC-ID L2f - Spoken Language Systems Lab R. Alves Redol, 9

1000-029 Lisboa

Lisboa

Portugal

Tel.: +351 213 100 268

Fax: +351 213 145 843

Amália Mendes

http://www.clul.ul.pt/en/researcher/146-amalia-mendes

Faculdade de Letras da Universidade de Lisboa

CLUL

[javascript protected email address]

Alameda da Universidade

1600-214 Lisbon

Tel.: 00351217904961

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

Contact Persons

Amália Mendes

http://www.clul.ul.pt/en/researcher/146-amalia-mendes

Faculdade de Letras da Universidade de Lisboa

CLUL

[javascript protected email address]

Alameda da Universidade

1600-214 Lisbon

Tel.: 00351217904961

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

Isabel Trancoso Female

http://www.l2f.inesc-id.pt/~imt/imt_en.html

Professor

[javascript protected email address]

INESC-ID L2f - Spoken Language Systems Lab R. Alves Redol, 9

1000-029 Lisboa

Lisboa

Portugal

Tel.: +351 213 100 268

Fax: +351 213 145 843

text

Monolingual text corpusLanguages

Portuguese (250,000 Tokens)

Linguality

Linguality type: Monolingual

Size

250,000 Tokens

Monolingual text corpusLanguages

Portuguese (250,000 Tokens)

Language Script: pt-PT

Linguality

Linguality type: Monolingual

Text Format

sgml (250,000 Tokens)

Size

250,000 Tokens

Character encoding

UTF - 8

Domains

general (250,000 Tokens)

Modalities

Written Language

Classification

(25,000 Tokens)

Text type: Miscellaneous

Text genre: Written

(162,500 Tokens)

Text type: Newspaper

Text genre: Written

(50,000 Tokens)

Text type: Book

Text genre: Written

(12,500 Tokens)

Text type: Periodical

Text genre: Written

Time Coverage

1996-1997 (250,000 Tokens)

Geographic coverage

Portugal (250,000 Tokens)

Creation

Creation mode details: The POS-tagging was done automatically with INESC-ID tool “Palavroso”. The disambiguation was performed by linguists and the pos annotation was manually verified.

Creation mode: Mixed

Creation Tools

PALAVROSO

Resource Creation

Resource Creator

Maria Fernanda Bacelar do Nascimento

http://www.clul.ul.pt/en/researcher/103-maria-nascimento

Faculdade de Letras da Universidade de Lisboa

CLUL

Head of research group

[javascript protected email address]

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

Tel.: 00351217904955

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

Isabel Trancoso Female

http://www.l2f.inesc-id.pt/~imt/imt_en.html

Professor

[javascript protected email address]

INESC-ID L2f - Spoken Language Systems Lab R. Alves Redol, 9

1000-029 Lisboa

Lisboa

Portugal

Tel.: +351 213 100 268

Fax: +351 213 145 843

Creation lasted: 04/01/1996 - 03/31/1998

Funding Project

LE-PAROLE (LE-PAROLE)

URLs: http://www.clul.ul.pt/en/research-teams/197-le-parole, http://www.elda.org/catalogue/en/text/doc/parole.html, ftp://ftp-tei.uic.edu/pub/tei/app/le02.html

Funding Type: Eu Funds

Funders: European Comission - DGXIII, Telematics Application of Common Interest - Contract LE2 - 4017

Funding Country: European Comission

Project duration: 04/01/1996 - 03/31/1998

Metadata

Created: 07/16/2012

Last Updated: 01/12/2021

Metadata Creator

Amália Mendes

http://www.clul.ul.pt/en/researcher/146-amalia-mendes

Faculdade de Letras da Universidade de Lisboa

CLUL

[javascript protected email address]

Alameda da Universidade

1600-214 Lisbon

Tel.: 00351217904961

Fax: 00351217965622

Centro de Linguística da Universidade de Lisboa

http://www.clul.ul.pt

CLUL

Av. Prof. Gama Pinto, 2

1649-003 Lisbon

[javascript protected email address]

Tel.: +351217920000

Fax: +351217965622

Usage

Access tools

http://www.elda.fr/cata/text/W0024.html

Foreseen UseNlp Applications

Use NLP Specific: Lemmatization, Lexicon Access, Morphosyntactic Tagging, Pos Tagging

Human Use

Use NLP Specific: Linguistic Research

Actual Use - Nlp Applications

Use NLP Specific: Lemmatization, Lexicon Access, Morphosyntactic Tagging, Pos Tagging

Usage Report

Document Type: In Proceedings

Bacelar do Nascimento, M. F., P. Marrafa, L. A. S. Pereira , R. Ribeiro, R. Veloso , L. Wittmann, "LE-PAROLE - Do corpus à modelização da informação lexical num sistema-multifunção", , pp. pp. 115-134. , XIII Encontro Nacional da Associação Portuguesa de Linguística , 1998

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas

Actual Use - Human Use

Use NLP Specific: Linguistic Research

Usage Report

Document Type: In Proceedings

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas

Documentation

Document Type: In Proceedings

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas