|Handle:||https://hdl.handle.net/21.11129/0000-000E-5998-3 (persistent URL to this page)|
LX-Sentence Splitter is a language processing tool for delimiting sentences in Portuguese. It was developed and is maintained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics.
LX-Sentence Splitter marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. It also unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing on a 12,000 sentence corpus accurately hand tagged with respect to sentence and paragraph boundaries.
You may also be interested to use our LX-Tokenizer, LX-Tagger, or LX-Suite online services for the tokenization, part-of-speech tagging, and sub-syntactic analysis of Portuguese.