The Dataset of Nuanced Assertions on Controversial Issues (NAoCI) dataset consists of over 2,000 assertions on sixteen different controversial issues. It has over 100,000 judgments of whether people agree or disagree with the assertions, and of about 70,000 judgments indicating how strongly peopl...
Dundee GCG-Bank contains hand-corrected deep syntactic annotations for the Dundee eye-tracking corpus (Kennedy et al., 2003). The annotations are designed to support psycholinguistic investigation into the structural determinants of sentence processing effort. Dundee GCG-Bank is distributed as a ...
Inquirições reais de 1258, publicadas em Portugaliae Monumenta Historica). Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica)
The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...
The LX-ESSLLI 2008 data set was created from the ESSLLI 2008 Distributional Semantic Workshop shared-task set, made of 44 concrete nouns grouped in 6 semantic categories (4 animate and 2 inanimate). The grouping is done in an hierarchical way following the top 10 properties from the McRae (2005) ...
Hesita-POS is an annotaded corpus. Tv News.
CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.
The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.