Geo-Net-PT 02

Geo-Net-PT 02 is a public Geospatial Ontology of Portugal (see Chaves et al., 2007), a computational resource (see Rodrigues et al., 2006 and Rodrigues, 2009) for applications demanding geographic information about Portugal, and contains 701,209 concepts stored in a GKB system, most of them admin...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
CORAA NURC-São Paulo Minimal Corpus

CORAA NURC-SP Minimal Corpus is a manually annotated corpus of Brazilian Portuguese spontaneous speech (São Paulo variety). The corpus is a subset of NURC (‘Cultured Linguistic Urban Norm’) project collection, one of the most influential in Brazilian Linguistics. The corpus was brought to digital...

Resource Type:Corpus
Media Type:Audio
Language:Portuguese
MWN.PT - WordNet of Portuguese

A wordnet is a lexical database. It groups synonymous words into sets, the synsets, which represent distinct concepts. These synsets form nodes in a network, which are interlinked through edges that correspond to semantic relations between those synsets. For instance, the hypernym relation, also ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Albertina PT-PT base

Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Albertina PT-BR base

Albertina PT-BR base is a foundation, large language model for American Portuguese from Brazil. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free of...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Albertina PT-PT

Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It has different versions that were tra...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
LX-Rare Word Similarity Dataset

The LX-Rare Word Similarity Data set was created from Stanford Rare Word (RW) Similarity data set (Luong et al., 2013). This list contains 2 034 words (1 017 pairs of words). All the words were extracted from Wikipedia and from WordNet (Miller, 1995), a lexical database where the concepts are gro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-WordSim-353

The LX-WordSim-353 was created from WordSim-353 (Agirre et al., 2009). As the name suggests, this data set contains 353 pairs of words. Both words in each pair can have different morphosyntactic categories. The data set is made of nouns, adjectives, verbs and named entities, and has no multiwords...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
HESITA database

The HESITA database is a corpus consisting of television daily news collected over a month and was annotated regarding to hesitation events, acoustical environments, speaking styles, speaker characteristics and respiratory events, among other characteristic sounds.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Natolin European Centre Dataset (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. The Polish-English parallel corpus is composed of three ...

Resource Type:Corpus
Media Type:Text
Language:Polish

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)