The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
LX-Lemmatizer is a freely available online service for fully-fledged lemmatization of Portuguese verbs. It was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics. LX-Lemmatizer takes a Portuguese verb form and deliv...
Based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided.
LX-TimeAnalyzer is a freely available online service for the extraction of temporal information from Portuguese text. It was developed and is maintained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. LX-TimeAnalyzer extracts temporal inform...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies sentences in plain text Tools in workflow: Freeling sentence splitter web service (service provided by the PANACEA project) NOTE: The licence provided covers the web service o...
This is a UIMA wrapper for the OpenNLP Tokenizer tool. It splits English sentences into individual tokens. The tool forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Comp...
LX Semantic Similarity is an online service for measuring the semantic similarity between words in Portuguese. This service uses the LX-DSemVectors, a distributional semantics model (a.k.a. word embeddings) of the Portuguese language. The model represents each word in its vocabulary by a vecto...
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies clauses/segments in plain text. Also identifies sentences, tokens, POS tags and lemmas. Tools in workflow: Cafetiere Sentence Splitter (University of Manchester), TTL Tokenizer...
The MLSS Sentence Splitter is a web service tool, which takes text as input and outputs the identified sentences surrounded by tags. The tool was tuned for Maltese. The download for this resource only contains the narrative description in a Word file. The web service has one methods which can ...