Albertina PT-BR No-brWaC

Albertina PT-* is a foundation, large language model for the Portuguese language.

It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were trained for different variants of Portuguese (PT), namely the European variant from Portugal (PT-PT) and the American variant from Brazil (PT-BR), and it is distributed free of charge and under a most permissible license.

Albertina PT-BR No-brWaC is a version for American Portuguese from Brazil trained on data sets other than brWaC, and thus with a most permissive license.

You may be interested also in Albertina PT-BR, trained on brWaC. To the best of our knowledge, these are encoders specifically for this language and variant that set a new state of the art for it, and is made publicly available and distributed for reuse.

Albertina PT-BR No-brWaC is developed by a joint team from the University of Lisbon and the University of Porto, Portugal.

Contact Resource Maintainer

  • Language Description
  • text

People who looked at this resource also viewed the following: