A publicação Arquivo dos Açores, consagrada como obra de referência para a investigação histórica sobre o arquipélago dos Açores, conta com duas séries, num total de 20 volumes. A primeira série do Arquivo dos Açores, composta por 15 volumes, decorreu entre 1878 e 1959, com grandes interrupções r...
Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER. NER on microblogs presents many complications such as informality of language, shortened named entities, brevity of expressions, and inconsistent capitalization (for...
This corpus was run through BiRoamer https://github.com/bitextor/biroamer to anonymise the Portuguese-English parallel data from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with ...
An Arabic twitter data set of 7,503 tweets. The released data contains manual Sentiment Analysis annotations as well as automatically extracted features, saved in Comma Separated (CSV) and Attribute-Relation File Format (ARFF) file formats. Due to twitter privacy restrictions we replaced the orig...
The text of the novel Sania (eng. The Sledge) served as a training corpus. It was written in 1955 by Ion Druță and printed originally in Cyrillic scripts. We have followed a special previously developed technology of recognition and specialized lexicons. In such a way, we have obtained the electr...
Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...
Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It has different versions that were tra...
Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were...
Albertina PT-BR base is a foundation, large language model for American Portuguese from Brazil. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free of...
Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were...