Industry terms
The present tool, that was built to deal with Portuguese-specific issues concerning syntactic categorization, assigns a single morpho-syntactic tag, from the tagset below, to every token. The tag is attached to the token, using a / (slash) symbol as separator: um exemplo → um/IA exemplo/CN ...
Economical Crisis terms
The corpus contains 1,030 online communication messages, randomly selected from Network News Transfer Protocol (NNTP) newsgroups, the bug tracking system Bugzilla and the bug tracking system GitHub. NNTP articles, Bugzilla and GitHub comments were selected randomly so that the sample exhibits sim...
Memoria de traducción Portal oficial de turismo de España www.spain.info
Database with 2.253 citations extracted from the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese) and manually revised. Format: tab separated file Fields: - context number - source file id - citation
Wixarika is an indigenous language spoken in central west Mexico by approximately fifty thousand people. For indigenous languages like Wixarika, there is a lack of digital resources in general since native speakers do not necessarily generate a digital fingerprint on public forums. The lack of...
Terms that have (more or less) recently been accepted and normalised by Termcat, mixed fields
The MLSS Sentence Splitter is a web service tool, which takes text as input and outputs the identified sentences surrounded by tags. The tool was tuned for Maltese. The download for this resource only contains the narrative description in a Word file. The web service has one methods which can ...
Perfil Sociolinguístico da Fala Bracarense - POS is a manually verified part-of-speech annotation of the EXMARaLDA transcriptions in "Perfil Sociolinguístico da Fala Bracarense", a Portuguese speech corpus with 90 hours of recorded spontaneous speech, aligned with its transcription in EXMARaLDA f...