ACOPOST is a free and open source collection of four part-of-speech taggers (t3, met, tbt, and et). In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up the words in a text (corpus) as co...
GREC is a semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) which is intended for training IE systems and/or resources which are used to extract events from biomedical literature.
The BioLexicon is a large-scale, wide-coverage computational lexicon covering the biomedical domain. A large part of the lexicon is concerned with covering biomedical terms and their variants. Entries for domain-specific verbs include syntactic and semantic information. The lexicon includes entri...
The corpus was developed as a linguistic resource for Automatic Summarization research and his relation with different issues to engage studies on the discourse treatment. Summ-it consists of fifty texts from Science domain extracted from Science section of Brazilian daily newspaper Folha de Sã...
Europarl QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora ...
The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...
This resource is part of Deliverable 4.6 of the QTLeap FP7 project (Contract number 610516). In its current development (15% of the intended goal of the project), it is composed of 150 sentences (1,416 English tokens and 1,275 Basque tokens). The sentences are excerpts from journalistic text from...
A collection of language resources for the evaluation of distributional semantic models of Portuguese: LX-SimLex-999: http://metashare.metanet4u.eu/go2/lx-simlex-999 LX-Rare Word Similarity Data set: http://metashare.metanet4u.eu/go2/lx-rare-word-similarity-dataset LX-WordSim-353: h...
ixa-pipe-dep-eu is a Basque dependency parsing tool. It is based on MATE-tools. This tool takes a document in Natural Language Processing Annotation Format (NAF) format (http://wordpress.let.vupr.nl/naf/) and outputs a new NAF document. This tool is partly funded by the European Commission ...
DVPM-EtyMor is a lexical database. Etymological, morphological and textual exemplification. Around 3000 verbs. Language: Medieval portuguese.