Albertina PT-BR

Albertina PT-* is a foundation, large language model for the Portuguese language.

It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were trained for different variants of Portuguese (PT), namely the European variant from Portugal (PT-PT) and the American variant from Brazil (PT-BR), and it is distributed free of charge and under a most permissible license.

Albertina PT-BR is the version for American Portuguese from Brazil, trained on the brWaC data set.

You may be interested also in Albertina PT-BR No-brWaC, trained on data sets other than brWaC and thus with a more permissive license. To the best of our knowledge, these are encoders specifically for this language and variant that, at the time of its initial distribution, set a new state of the art for it, and is made publicly available and distributed for reuse.

Albertina PT-BR is developed by a joint team from the University of Lisbon and the University of Porto, Portugal.

Contact Resource Maintainer

  • Language Description
  • text

People who looked at this resource also viewed the following: