Code-switched English-Spanish Tweets

This package contains the collection of tweets described in the LREC 2018 paper: "Collecting Code-Switched Data from Social Media", Gideon Mendels, Victor Soto, Aaron Jaech and Julia Hirschberg, LREC 2018. Please remember to cite this paper if you use this resource. The tagged_tweets_ids file con...

Resource Type:Corpus
Media Type:Text
Languages:English
Spanish; Castilian
Civil Aviation Regulations (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
CIPM-POS

CIPM-POS is a set of historical, religious, notarial, literary texts in prose and verse, written is medieval portuguese. It contains around 88000 words.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CIPM-browser

CIPM-browser is a browser for the CIPM corpus, a corpus of medieval Portuguese.

Resource Type:Tool / Service
CIPM

CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-WordSenses

The CINTIL-WordSenses corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 23,825 sentences of written Portuguese with open-class terms manually disambiguated and annotated with synset identifiers from the Portuguese MultiWordNet (MWNPT) (Pianti ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-USuite

CINTIL-USuite is a corpus of Portuguese that is annotated with lemmas, the Universal Part-of-Speech tagset (UPOS) and Universal feature bundles, related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branc...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-UPos

CINTIL-UPos is a corpus of Portuguese that is annotated with the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and Jo...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-UDep

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-Treebank Online Searcher

CINTIL-Treebank Online Searcher is a freely available online service to search and view the constituency and dependency tree of the CINTIL-Treebank. This service was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics. ...

Resource Type:Tool / Service

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)