Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
Syntactic parser for English. Outputs dependency relations. Also outputs parts-of-speech for each token. The tool is provided as a UIMA component, specifically as Java archive (jar) file, which can be incorporated within any UIMA workflow. However, it is particularly designed use in the U-Com...
Conta-me Histórias [http://contamehistorias.pt] is a temporal summarization framework of news articles that allows users to explore and revisit events in the past. To select relevant stories of different time-periods, we rely on YAKE! [http://yake.inesctec.pt] a keyword extraction algorithm devel...
SenseClusters is a package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods.
Tokenisation is one of the functionalities of the GENIA tagger, which additionally outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is a UIMA component, which forms part of th...
CINTIL Corpus Concordancer is a freely available online concordancing service to support the research usage of the CINTIL Corpus. This concordancer was developed and is maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics, in coopera...
LX-Translator is a freely available on-line service for translation between Portuguese and Chinese. This service was developed and is maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics. Intrinsic evaluation of the model for the ...
LX-Lemmatizer is a freely available online service for fully-fledged lemmatization of Portuguese verbs. It was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics. LX-Lemmatizer takes a Portuguese verb form and deliv...
LX-Proficiency is an online service for the quantitative analysis of texts along a range of linguistic metrics, and for the estimation of the proficiency level of texts. These quantitative metrics are meant to provide support in the classification of texts according to the proficiency levels i...
Reddit Dataset Extraction Tool (RDET) is a tool that takes advantage of the resources available at 'pushshift.io' that relate to Reddit comments and submissions and generates new datasets based on any given subreddit.