SETimes.HR

Handle:	https://hdl.handle.net/21.11129/0000-000D-FE9C-6 (persistent URL to this page)

We present SETimes.HR ― the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETimes parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETimes.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme.

Download

DistributionLicence

CC - BY - SA

Restrictions: Attribution, Share Alike

Download location: hidden

Contact Person

Željko Agić

[javascript protected email address]

text

Monolingual text corpusLanguages

Czech

Linguality

Linguality type: Monolingual

Size

89,129 Tokens

Metadata

Created: 10/01/2020

Last Updated: 11/18/2020

Metadata Creator

Sara Grilo

[javascript protected email address]

University of Lisbon, Faculty of Sciences FCUL Sala 6.3.32, Edifício C6, Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa

1749-016 Lisboa

Campo Grande

Portugal

Usage

Foreseen UseNlp Applications

Use NLP Specific: Parsing, Pos Tagging

Documentation

Document Type: In Proceedings

Željko Agić and Nikola Ljubešić, The SETimes.HR Linguistically Annotated Corpus of Croatian, , 2014

Editor: Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis

Publisher: European Language Resources Association (ELRA)

Book Title: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

ISBN: 978-2-9517408-8-4

Document Language: English

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following: