Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...
TinySVM is an implementation of Support Vector Machines (SVMs) (Vapnik, 1995; Vapnik, 1998) for the problem of pattern recognition.
Part-of-speech tagger tuned to biomedical text, provided as a web service.
The resource is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards.
The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural l...
The present tool, that was built to deal with Portuguese-specific issues concerning a few non-trivial cases that involve tokenization-ambigous strings, segments text into lexically relevant tokens, using whitespace as the separator. Note that, in these examples, the | (vertical bar) symbol is use...
LX-Tagger is a freely available online service for the part-of-speech tagging of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. The service is composed by a set of shallow processing tools: A se...
The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.