Grafone-Tool

Grafone-Tool is a tool for conversion from grapheme to phoneme for European Portuguese. The converter works with the Portuguese spelling, both prior to and after the Orthographic Agreement of 1990.

Resource Type:Tool / Service
Language:Portuguese
Dicionário de Gentílicos e Topónimos

Dicionário de Gentílicos e Topónimos is a list of pairs of toponyms and demonyms. The toponyms and demonyms included have a morphologically compositional relation between each other. The list contains around 1500 such pairs and additionally provides information on the toponym referent (upper unit...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P. (Part I)

BDCamões Corpus is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a time span from the 15th to the 21st century, and adhering to different orthographic conve...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CRPC Discourse Bank v1.0

The CRPC Discourse Bank is labeled for discourse relations (also referred to as rhetorical relations or coher- ence relations), such as cause and condition, that hold between two spans of text and contribute to ensure the overall cohesion and coherence of the text. The scheme follows the principl...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PS corpus (Post-Scriptum)-PT

PS Corpus (Post-Scriptum)-PT is a corpus of 2215 informal mail letters written in Portuguese during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speec...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Inquirições reais

Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica).

Resource Type:Corpus
Media Type:Text
Language:Portuguese
News corpus categorised

The News corpus developed by LIACC in JSON format was complemented with POS and keyword topics annotation. POS-tagging =========== The POS-tagging used the tagger described in Généreux et al. (2012) The title and text body were extracted, tokenized and pos-tagged. Two new fields were added...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CRPC-Named Entity Recognizer

A NER-classifier based on memory-based learning, trained on the CINTIL dataset, a corpus that contains part of the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese). https://portulanclarin.net/repository/browse/cintil-corpus-internacional-do-por...

Resource Type:Tool / Service
Language:Portuguese
LX-Battig

The LX-Battig was created from Battig test.set (Baroni et al., 2010). This data set has 83 concrete concepts of the following 10 categories: mammals, birds, fish, vegetables, fruit, trees, vehicles, clothes, tools and kitchenware. The categories names and the concepts were translated by two trans...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Hesita-POS

Hesita-POS is an annotaded corpus. Tv News.

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)