Anonymised ParaCrawl release 7 Portuguese-English
Handle: | https://hdl.handle.net/21.11129/0000-000D-FE7B-C (persistent URL to this page) |
---|---|
URL: | http://paracrawl.eu/ |
This corpus was run through BiRoamer https://github.com/bitextor/biroamer to anonymise the Portuguese-English parallel data from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice. The crawl is not targeted to a particular domain, intending to provide broad coverage. Anonymisation is an automated process driven by named entity recognition and is far from perfect.
DownloadPeople who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: