This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO - IP case law (BoA) French-English
Bilingual (EN-PT) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020)
CSTParser is a multi-document discourse parser. Based on machine learning techniques and hand-crafted rules, the system identifies a set of relations predicted by CST (Cross-document Structure Theory) among sentences of different texts on the same topic.
The CINTIL-TreeBank (Branco et al., 2011) is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are ...
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts Danish-English from the Danish Ministry o...
SENTER is a SENtence splitTER for Portuguese.
The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biomedical named entities (genes and proteins) in plain text. Also identifies sentences. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), NEMine (Univ...
GREC is a semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) which is intended for training IE systems and/or resources which are used to extract events from biomedical literature.
Geo-Net-PT 02 is a public Geospatial Ontology of Portugal (see Chaves et al., 2007), a computational resource (see Rodrigues et al., 2006 and Rodrigues, 2009) for applications demanding geographic information about Portugal, and contains 701,209 concepts stored in a GKB system, most of them admin...