CorPop: a corpus of popular Brazilian Portuguese


This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related disciplines, such as Corpus Studies, Text Linguistics, Psycholinguistics and Natural Language Processing studies. The development of CorPop took place through the compilation of data about the level of literacy of Brazilian readers and the characteristics of a standard of text simplicity in a corpus of texts suitable for these readers. The data were collected from the surveys Indicador de Analfabetismo Funcional (Inaf) and Retratos da Leitura no Brasil, as well as from a questionnaire with readers.

