LX-ESSLLI 2008

The LX-ESSLLI 2008 data set was created from the ESSLLI 2008 Distributional Semantic Workshop shared-task set, made of 44 concrete nouns grouped in 6 semantic categories (4 animate and 2 inanimate). The grouping is done in an hierarchical way following the top 10 properties from the McRae (2005) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
RedditPT Dataset

This dataset is a collection of dialogues extracted from the Portugal subreddit with RDET (Reddit Dataset Extraction Tool). It is composed of around 58,964,715 tokens in 218,550 dialogues.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Priberam Compressive Summarization Corpus

This is a corpus for multi-document summarization for European Portuguese. It contains 80 topics, each of which has 10 documents, for a total of 800 documents. Each topic contains two human summaries. The summaries are compressive: they are the result of a compression of the sentences in the orig...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Grafone-LEX

Grafone-LEX is a lexical database for conversion from graphemes to phonemes

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Arquivo Dialetal CLUP - POS

Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
Grafone-Tool

Grafone-Tool is a tool for conversion from grapheme to phoneme for European Portuguese. The converter works with the Portuguese spelling, both prior to and after the Orthographic Agreement of 1990.

Resource Type:Tool / Service
Language:Portuguese
ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Portulex

Portulex is a lexical database in European Portuguese that contains words from reading texts in children’s schoolbooks for reading and language instruction in Grades 1 to 4. It comprises a wordform and a lemma database. The wordform database consists of 17,062 inflected wordforms, and the lemma d...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
LX-UDParser

LX-UDParser is a UD parser for Portuguese, which adopts the Universal Dependency framework, with an initial performance of 90.87 for UAS and 88.01 for LAS under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and João Rodri...

Resource Type:Tool / Service
Language:Portuguese
PsychAnaphora - Reading times in a self-paced reading task

This set of materials resulted from a study on the processing of explicit pronouns in European Portuguese. A spreadsheet containing data from 75 participants (young adults), namely, per-word reading times and accuracy data on comprehension questions, is provided. Complementary materials (Read Fir...

Resource Type:Language Description
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)