A computational lexicon for Portuguese that provides mappings between verbs and their nominalizations.
SpeakerID is a corpus of 100 spoken sentences and pseudosentences in European Portuguese (PT) and Mandarin Chinese (CH) designed to enable research on speaker identity. The utterances were recorded by five male speakers of European Portuguese (Speakers A-E) and five male speakers of Mandarin Chi...
Portuguese-English parallel from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice....
HuggingFace (pytorch) pre-trained roBERTa model in Portuguese, with 6 layers and 12 attention-heads, totaling 68M parameters. Pre-training was done on 10 million Portuguese sentences and 10 million English sentences from the Oscar corpus. Please cite: Santos, Rodrigo, João Rodrigues, Antóni...
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Complete text of the Portuguese Constitution in Portugue...
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Legislation concerning Portuguese Parliament; three bili...
Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020).
Bilingual (EN-PT) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020)
Bilingual (EN-PT) corpus acquired from Wikipedia on health and COVID-19 domain (2nd May 2020)
Bilingual (EN-PT) corpus acquired from the website https://antibiotic.ecdc.europa.eu/