Reddit Dataset Extraction Tool (RDET) is a tool that takes advantage of the resources available at 'pushshift.io' that relate to Reddit comments and submissions and generates new datasets based on any given subreddit.
RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.
RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.
SenseClusters is a package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods.
SENTER is a SENtence splitTER for Portuguese.
Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
Part-of-speech tagger tuned to biomedical text, provided as a web service.
Conta-me Histórias [http://contamehistorias.pt] is a temporal summarization framework of news articles that allows users to explore and revisit events in the past. To select relevant stories of different time-periods, we rely on YAKE! [http://yake.inesctec.pt] a keyword extraction algorithm devel...
TinySVM is an implementation of Support Vector Machines (SVMs) (Vapnik, 1995; Vapnik, 1998) for the problem of pattern recognition.
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...