EN-PT Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020).
HuggingFace (pytorch) pre-trained roBERTa model in Portuguese, with 6 layers and 12 attention-heads, totaling 68M parameters. Pre-training was done on 10 million Portuguese sentences and 10 million English sentences from the Oscar corpus. Please cite: Santos, Rodrigo, João Rodrigues, Antóni...
Terms that have (more or less) recently been accepted and normalised by Termcat, mixed fields
Terms for Digital Marketing
Terms of Research Thesaurus
Terms for Fairs and Congresses
Industry terms
Terms from different sciences and industries - ecology, economy, law, sociology, medecine, tourism and computation.
This resource comprises multilingual lexicon entries used for the translation of specific IT domain expressions. This gazetteer has been collected from four different sources: VLC, LibreOffice and KDE localization projects and IT domain Wikipedia articles.
This resource is part of Deliverable 5.7 of the European Comission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). This gazetteer comprises multilingual lexicon entries used for the translation of specific IT domain expressions for Basque, Bulgarian, Czech, Dutch, Engli...