The EUROPARL Corpus (subpart Portuguese-English of the parallel corpora), available at http://www.statmt.org/europarl/, was extracted from the proceedings of the European Parliament (Koehn, 2005). It contains transcriptions of sessions dating back from 1996 to 2011, in a total of approximately 58,324,562 tokens words of European Portuguese (L1) and 49,216,896 tokens of English (translation).
Généreux, M, I. Hendrickx, A. Mendes,
A Large Portuguese Corpus On-Line: Cleaning and Preprocessing,
http://www.propor2012.org/
, pp. 113-120
, 10th International Conference PROPOR2012
, 2012
Editor: Caseli, H. et al. (eds.)
Publisher: Heidelberg: Springer-Verlag
Keywords: Corpus cleaning, PoS Tagging, Lemmatization
Généreux, M, I. Hendrickx, A. Mendes,
A Large Portuguese Corpus On-Line: Cleaning and Preprocessing,
http://www.propor2012.org/
, pp. 113-120
, 10th International Conference PROPOR2012
, 2012
Editor: Caseli, H. et al. (eds.)
Publisher: Heidelberg: Springer-Verlag
Keywords: Corpus cleaning, PoS Tagging, Lemmatization
Use NLP Specific: Information Extraction, Lemmatization, Lexicon Access, Machine Translation, Morphosyntactic Tagging, Pos Tagging, Word Sense Disambiguation
Human Use
Use NLP Specific: Linguistic Research
Actual Use - Nlp Applications
Use NLP Specific: Information Extraction, Lemmatization, Lexicon Access, Machine Translation, Morphosyntactic Tagging, Pos Tagging, Word Sense Disambiguation
Actual Use - Human Use
Use NLP Specific: Linguistic Research
Documentation
Document Type: Article
Koehn, P. ,
“EUROPARL: A Parallel Corpus for Statistical Machine Translation” ,
, pp. pp. 79-86
, Tenth Machine Translation Summit, Phuket, Thailand
, 2005
Book Title: Proceedings of the Tenth Machine Translation Summit, Phuket, Thailand,
Généreux, M., I. Hendrickx and A. Mendes (2012),
“A Large Portuguese Corpus On-Line: Cleaning and Preprocessing”,
, pp. 113-120
, 10th International Conference PROPOR1012,
, 2012
Editor: Berlin, Heidelberg: Springer-Verlag, pp. 113-120.
Book Title: Proceedings of the 10th International Conference PROPOR1012,
People who looked at this resource also viewed the following: