FLY corpus - syntax

FLY Corpus is a corpus composed by 2000 informal letters written in Portuguese, in the years spanning from 1900 to 1974, in the context of war, migration, imprisonment and exile. Each letter is in an XML file with two main parts: (a) the header, which contains metadata about the document (the ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Port-AoA Words

Port-AoA Words (Cameirão & Vicente, 2010) is a lexical database containing 7 psycholinguistic characteristics (e.g. neighborhood density, written-word frequency, familiarity, imageability, etc). Standard adult vocabulary.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
PropBankPT

The PropBankPT (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal translated. For the creation of this PropBank we adopted a semi-automatic analysis with...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects

CORDIAL-SIN is a corpus of spoken dialectal European Portuguese developed at Centro de Linguística da Universidade de Lisboa (CLUL). The materials for this corpus were drawn from the recordings of dialect speech collected by the CLUL ATLAS team as fieldwork interviews for linguistic atlases betwe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
RudriCo-TOK

RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.

Resource Type:Tool / Service
Language:Portuguese
LX-LR4DistSemEval

A collection of language resources for the evaluation of distributional semantic models of Portuguese: LX-SimLex-999: http://metashare.metanet4u.eu/go2/lx-simlex-999 LX-Rare Word Similarity Data set: http://metashare.metanet4u.eu/go2/lx-rare-word-similarity-dataset LX-WordSim-353: h...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Embeddings for Comparative Probing of Lexical Semantics Theories

Embeddings used in: Branco, António, João Rodrigues, Małgorzata Salawa, Ruben Branco and Chakaveh Saedi, 2020. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness. In Proceedings of the International Conference on Computational Linguistics (C...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
LX-4WAnalogies

The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects – TreeBank

The CORDIAL-SIN–TreeBank is a collection of 177596 syntactic parse trees of the Syntax-oriented Corpus of Portuguese Dialects. CORDIAL-SIN is a corpus of spoken dialectal European Portuguese, developed at Centro de Linguística da Universidade de Lisboa, that compiles excerpts of spontaneous and s...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-WordSenses

The CINTIL-WordSenses corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 23,825 sentences of written Portuguese with open-class terms manually disambiguated and annotated with synset identifiers from the Portuguese MultiWordNet (MWNPT) (Pianti ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (428)
Audio (17)
Image (1)