MLSS Tokeniser Web Service

The web service is a tool which takes text as input and returns a list of tokens. The tokens can be orthographical words, numerals and punctuation marks. The tokeniser was designed to work on Maltese texts. The download for this resource only contains the narrative description in a Word file.

The WSDL link is http://metanet4u.research.um.edu.mt/services/MtTokeniser?wsdl.

The service has one method which can be invoked:
• String tokenise(String text, Boolean tokenTags, String separator)

The method takes has three parameters:
• text
This is the text that will be tokenised

• tokenTags
This is a boolean variable. If tokenTags is true than the output tokens will be wrapped in tags
(ex: <token> tagged_text </token>). If false, the token will have no tags.

• separator
This is a string which will be used to separate one token from another in the output string.

Input data format: text string with sentences

Output data format: a text string with the tagged sentences in the format <sentence> sentence_text </sentence>

The web service does not need any external tool.

Download






People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: