Yake! (Campos et al. 2020) is a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike most of the systems, Yake! does not rely on dictionaries nor thesauri, neither is trained against any corpora. Instead, we follow a...
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
Reddit Dataset Extraction Tool (RDET) is a tool that takes advantage of the resources available at 'pushshift.io' that relate to Reddit comments and submissions and generates new datasets based on any given subreddit.
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies paragraphs in plain text Tools in workflow: MLRS Paragraph Splitter (University of Malta) NOTE: The licence provided covers the web service only. Tools used to create the workf...
The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...
The purpose of the U-Compare platform is to facilitate easy and rapid development and evaluation of NLP and text mining systems. It includes utilities (including a graphical user interface, the U-Compare workbench, see separate record in META-SHARE) to create workflows from individual, interopera...
This is a UIMA component that provides a visualization of speech based output from UIMA workflows. It has been developed at the University of Manchester, using libraries of the Java Speech Toollkit (jstk). It has been designed specifically for use with the U-Compare text mining workbench (see sep...
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...
YamCha is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. We used it for NP chunking.
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural l...