Affect in Tweets PT

This is a data set of Portuguese tweets labeled with the emotion conveyed in the tweet. It was gathered using a methodology similar to the one used for building the Affect in Tweets data set used in the SemEval-2018 Task 1. The data set contains 11219 tweets, each labeled with an emotion (anger,...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Lexicon of discourse markers for European Portuguese

The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
MARv4

MARv-POS is a part-of-speech tagger tool (probabilistic POS annotation module). MARv4's architecture comprehends two submodules: a set of linguistically-oriented disambiguation rules module and a probabilistic disambiguation module. The linguistic-oriented is no longer used in the STRING chain be...

Resource Type:Tool / Service
Language:Portuguese
LexMan-POSTagger

LexMan-POSTagger is a morphological analyser tool that morphologically tags all words. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Biographies of Portuguese People

This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P. (Part II)

BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P., is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a ti...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Multifunctional Computational Lexicon of Contemporary Portuguese

The resource consists of a Portuguese frequency lexicon based on a 16 million words corpus of written and spoken texts from different genres. The lexicon contains 26.443 entries (lemma) and 140

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
CRPC-Quotations

Database with 2.253 citations extracted from the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese) and manually revised. Format: tab separated file Fields: - context number - source file id - citation

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (428)
Audio (17)
Image (1)