Termoteca

Terms from different sciences and industries - ecology, economy, law, sociology, medecine, tourism and computation.

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:English
French
Galician
Portuguese
Spanish; Castilian
askIT Dataset

Collection of dialogues extracted from subreddits related to Information Technology (IT) and extracted with RDET (Reddit Dataset Extraction Tool). It is composed of 61,842,638 tokens in 179,358 dialogues.

Resource Type:Corpus
Media Type:Text
Language:English
CINTIL-UDep

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Parallel corpus (Greek - English) in the public administration domain (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel (el-en) corpus of 12509 translation units in th...

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)
Basic English-Maltese Dictionary

Bilingual wordlist, consisting of alphabetically ordered English lemmas with their Maltese translation and Maltese pronunciation (transcribed in ad-hoc system by the original author).

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:English
Maltese
BERTimbau - Portuguese BERT-Base language model

This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Base cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow and...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
DVPM-SynSem

DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
BioLexicon

The BioLexicon is a large-scale, wide-coverage computational lexicon covering the biomedical domain. A large part of the lexicon is concerned with covering biomedical terms and their variants. Entries for domain-specific verbs include syntactic and semantic information. The lexicon includes entri...

Resource Type:Corpus
Media Type:Text
Language:English
Grafone-LEX

Grafone-LEX is a lexical database for conversion from graphemes to phonemes

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Nexing Corpus

Corpus with the transcriptions of syllogistic reasoning protocols. Written transcriptions: Verbal data (30 hours) elicited during an experiment on syllogistic reasoning (each of 27 participants x the 64 syllogistic problems): Thinking aloud task; reflexive conversation Performance data: La...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)