U-Compare Discourse Parsing Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Performs discourse parsing on plain text. Also identifies sentences, tokens, parts of speech, lemmas, clauses and coreference chains Tools in workflow: UAIC-POSTagger, UAIC-NPChunker, UAI...

Resource Type:Tool / Service
Language:Romanian
U-Compare Co-reference Identification service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies co-reference chains in plain text. Also identifies sentences, tokens with parts-of-speech and lemmas, and NP chunks Tools in workflow: TTL-Tokenizer (RACAI, Romania), TTL-Tagger...

Resource Type:Tool / Service
Language:Romanian
U-Compare Segmentation Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies clauses/segments in plain text. Also identifies sentences, tokens, POS tags and lemmas. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), TTL Tokenizer...

Resource Type:Tool / Service
Language:Romanian
U-Compare NP Chunking Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies NP chunks in plain text. Also carries out sentence splitting, tokenisation and POS tagging Tools in workflow: MLRS Sentence Splitter (University of Malta), UAIC-POSTagger, UAIC-...

Resource Type:Tool / Service
Language:Romanian
Chancelaria de D. Afonso III: documentos em português

Os documentos em português da Chancelaria de D. Afonso III constituem o primeiro conjunto significativo de textos em português (34 documentos que recobrem um período de 24 anos: 1255 - 1279), sendo apenas a partir de 1279, com D. Dinis (1261-1325), que se inicia o uso sistemático do português co...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Portuguese RoBERTa language model

HuggingFace (pytorch) pre-trained roBERTa model in Portuguese, with 6 layers and 12 attention-heads, totaling 68M parameters. Pre-training was done on 10 million Portuguese sentences and 10 million English sentences from the Oscar corpus. Please cite: Santos, Rodrigo, João Rodrigues, Antóni...

Resource Type:Language Description
Media Type:Text
Languages:English
Portuguese
Spanish to English Machine translation module

Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf

Resource Type:Tool / Service
Languages:English
Spanish; Castilian
Portulex

Portulex is a lexical database in European Portuguese that contains words from reading texts in children’s schoolbooks for reading and language instruction in Grades 1 to 4. It comprises a wordform and a lemma database. The wordform database consists of 17,062 inflected wordforms, and the lemma d...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
COVID-19 EUROPARL v2 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (9th May 2020)

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Grafone-Tool

Grafone-Tool is a tool for conversion from grapheme to phoneme for European Portuguese. The converter works with the Portuguese spelling, both prior to and after the Orthographic Agreement of 1990.

Resource Type:Tool / Service
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)