BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P. (Part I)
BDCamões Corpus (Part I)
BDCamões Corpus is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a time span from the 15th to the 21st century, and adhering to different orthographic conventions. This set of characteristics makes of BDCamões an invaluable resource for research in Language Science and Technology, and in Digital Humanities.
Many texts in this corpus were also automatically annotated with state-of-the-art language processing tools. These annotated subcorpora are available also through PORTULAN CLARIN.
BDCamões Corpus is distributed in two parts. Part I is distributed with license CC-BY, and Part II with license MS NC-NoReD-ND 2.0.
This Part I consists of 127 complete documents, comprising over 3.1 million words, in 10 genres from 37 authors, covering a time span from the 15th to the 20th century, and adhering to different orthographic conventions.