|Handle:||https://hdl.handle.net/21.11129/0000-000B-D2F9-F (persistent URL to this page)|
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines.
A f-score of 99.94% was obtained when testing on a 12,000 sentence corpus accurately hand tagged with respect to sentence and paragraph boundaries.
LX-Chunker was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.