Search and Browse – PORTULAN CLARIN

Basic English-Maltese Dictionary

Bilingual wordlist, consisting of alphabetically ordered English lemmas with their Maltese translation and Maltese pronunciation (transcribed in ad-hoc system by the original author).

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	English
Languages:	Maltese

AuCoPro - Splitting

The AuCoPro-Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes. The dataset consists of two files, one for Afrikaans and one for Dutch. The annotation was performed according to annotation guidelines as described in Verhoeven, van Zaanen, van Huyss...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Afrikaans
Languages:	Dutch; Flemish

A Tweet Dataset Annotated in Four Emotion Dimensions

A corpus of 2,019 tweets annotated along each of four emotion dimensions: Valence, Dominance, Arousal and Surprise. Two annotation schemes are used: a 5-point ordinal scale (using SAM manikins for Valence, Arousal and Dominance) and pair-wise comparisons with an "about the same" option (here 2,01...

Resource Type:	Corpus
Media Type:	Text
Language:	English

A Terminological Inventory for Biodiversity

In order to construct the inventory, we firstly compiled a species name dictionary by combining all of the names available in Catalogue of Life (CoL), Encyclopedia of Life (EoL) and Global Biodiversity Information Facility (GBIF). The terms contained in this dictionary were then located within ...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Language:	English

askIT Dataset

Collection of dialogues extracted from subreddits related to Information Technology (IT) and extracted with RDET (Reddit Dataset Extraction Tool). It is composed of 61,842,638 tokens in 179,358 dialogues.

Resource Type:	Corpus
Media Type:	Text
Language:	English

Arquivo Dialetal CLUP - POS

Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.

Resource Type:	Corpus
Media Type:	Audio
Language:	Portuguese

Arquivo Dialetal CLUP - Orthographic and phonetic transcription

Arquivo Dialetal CLUP - ORTH is a speech corpus approximately with 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic and phonetic transcription.

Resource Type:	Corpus
Media Type:	Audio
Language:	Portuguese

Arquivo Dialetal CLUP - Áudio

Arquivo Dialetal CLUP - Áudio is an audio corpus of spontaneous speech, mainly from Northern Portugal.

Resource Type:	Corpus
Media Type:	Audio
Language:	Portuguese

ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:	Corpus
Media Type:	Text
Language:	Portuguese

A Repository of State of the Art and Competitive Baseline Summaries for DUC 2004

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, bec...

Resource Type:	Corpus
Media Type:	Text
Language:	English

Order by:

Filter by: