The corpus consists of 1000 MEDLINE abstracts. It is a subset of the original GENIA POS & term corpus, which was selected using the three MeSH terms human, blood cells and transcription factors. In each sentence, three types of information are annotated 1) biomedical terms are identified and assi...
The CINTIL-DependencyBank (Branco et al., 2011a) is a corpus of grammatical dependencies of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens) (see 3.2.). In additi...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies NP chunks in plain text. Also carries out sentence splitting, tokenisation and POS tagging Tools in workflow: MLRS Sentence Splitter (University of Malta), UAIC-POSTagger, UAIC-...
PS corpus (Post-Scriptum) - treebank is a treebank corpus of 586 informal mail letters written in Portuguese and Spanish during the Modern Ages (from the XVIth century to the beginning of the XIXth century). This treebank is a syntactically annotated subset of the Portuguese "PS corpus (Post-S...
CSTParser is a multi-document discourse parser. Based on machine learning techniques and hand-crafted rules, the system identifies a set of relations predicted by CST (Cross-document Structure Theory) among sentences of different texts on the same topic.
SENTER is a SENtence splitTER for Portuguese.
Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica).
The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...
PhenoCHF is an annotated corpus consisting of documents belonging to two different text types (i.e., narrative reports from electronic health records (EHRs) and literature articles). It is manually annotated by medical doctors with detailed information relating to mentions of phenotype concepts a...
This is a workflow that is designed especially for use in the UIMA-based U-Compare workbench (see separate META-SHARE record). The workflow is in "ucz" format (specific to U-Compare) and can be imported via the "Import Workflow" item in the "Workflows" menu of the U-Compare interface. It include...