The corpus was developed as a linguistic resource for Automatic Summarization research and his relation with different issues to engage studies on the discourse treatment. Summ-it consists of fifty texts from Science domain extracted from Science section of Brazilian daily newspaper Folha de Sã...
The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.
Carolina: General Corpus of Contemporary Brazilian Portuguese with provenance and typology information
Carolina is an open corpus for Linguistics and Artificial Intelligence with a robust volume of texts of varied typology in contemporary Brazilian Portuguese (1970-2021).
Human Use (1)
Lexicon Access (1)
Pos Tagging (1)