
File processing
Input format: Input files must be in .txt FORMAT with UTF-8 ENCODING and contain PORTUGUESE TEXT. Input files and folders can also be compressed to the .zip format.
Privacy: The input file you upload and the respective output files will be automatically deleted from our computer after being processed and the result downloaded by you. No copies of your files will be retained after your use of this service.
The size of your input file is large and its processing may take some time. To receive by email the url link from which to download your processed file when ready, enter your email address below. After being used for this purpose, your email address will be deleted from our computer.
Instructions to use this web service
The web service for this application is available at https://portulanclarin.net/workbench/lx-sentencesplitter/api/.
Below you find an example of how to use this web service with Python 3.
This example resorts to the requests package. To install this package, run this command in the command line:
pip3 install requests
.
To use this web service, you need an access key you can obtain by clicking in the button below. A key is valid for 31 days. It allows to submit a total of 1 billion characters by means of requests with no more 500000 characters each. It allows to enter 100,000 requests, at a rate of no more than 200 requests per hour.
For other usage regimes, you should contact the helpdesk.
The input data and the respective output will be automatically deleted from our computer after being processed. No copies will be retained after your use of this service.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | import json import requests # to install this library, enter in your command line: # pip3 install requests # This is a simple example to illustrate how you can use the LX-Sentence Splitter web service # Requires: key is a string with your access key # Requires: text is a string, UTF-8, with a maximum 500000 characters, Portuguese text, with # the input to be processed # Requires: format is a string, indicating the output format, which can be either # 'CINTIL' or 'JSON' # Ensures: output according to specification in https://portulanclarin.net/workbench/lx-sentencesplitter/ # Ensures: dict with number of requests and characters input so far with the access key, and # its date of expiry key = 'access_key_goes_here' # before you run this example, replace access_key_goes_here by # your access key # this string can be replaced by your input text = '''Esta frase serve para testar o funcionamento do LX-Sentence Splitter pelo Dr. Francisco. Esta outra frase faz o mesmo. O importante é distinguir, p.ex., os pontos finais dos não finais, etc., Dr.''' # To read input text from a file, uncomment this block #inputFile = open("myInputFileName", "r", encoding="utf-8") # replace myInputFileName by # the name of your file format = 'CINTIL' # other possible value is 'JSON' # Processing: url = "https://portulanclarin.net/workbench/lx-sentencesplitter/api/" request_data = { 'method': 'split', 'jsonrpc': '2.0', 'id': 0, 'params': { 'text': text, 'format': format, 'key': key, }, } request = requests.post(url, json=request_data) response_data = request.json() if "error" in response_data: print("Error:", response_data["error"]) else: print("Result:") print(response_data["result"]) # To write output in a file, uncomment this block #outputFile = open("myOutputFileName","w", encoding="utf-8") # replace myOutputFileName by # the name of your file #output = response_data["result"] #outputFile.write(output) #outputFile.close() # Getting acess key status: request_data = { 'method': 'key_status', 'jsonrpc': '2.0', 'id': 0, 'params': { 'key': key, }, } request = requests.post(url, json=request_data) response_data = request.json() if "error" in response_data: print("Error:", response_data["error"]) else: print("Key status:") print(json.dumps(response_data["result"], indent=4)) |
Access key for the web service
This is your access key for this web service.
The following access key for this web service is already associated with .
This key is valid until and can be used to process requests or characters.
Make sure to save this key before closing this dialog box.
LX-Sentence Splitter documentation
LX-Sentence Splitter
LX-Sentence Splitter is a freely available online service for delimiting sentences in Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics.
You may also be interested to use our LX-Tokenizer, LX-Tagger, or LX-Suite online services for the tokenization, part-of-speech tagging, and sub-syntactic analysis of Portuguese.
Features and evaluation
LX-Sentence Splitter marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. It also unwraps sentences split over different lines.
A f-score of 99.94% was obtained when testing on a 12,000 sentence corpus accurately hand tagged with respect to sentence and paragraph boundaries.
Authorship
LX-Sentence Splitter was developed by António Branco and João Silva at the NLX—Natural Language and Speech Group at the University of Lisbon, Department of Informatics.
Acknowledgments
The development of a state-of-the-art, complete suite of shallow processing tools for Portuguese was supported by FCT-Fundação para a Ciência e Tecnologia under the contract POSI/PLP/47058/2002 for the project TagShare and the contract POSI/PLP/61490/2004 for the project QueXting, and the European Commission under the contract FP6/STREP/27391 for the project LT4eL.
References
Irrespective of the most recent version of this tool you may use, when mentioning it, please cite this reference:
- Silva, João, António Branco, Sérgio Castro and Rúben Reis, 2010, Out-of-the-Box Robust Parsing of Portuguese", Lecture Notes in Artificial Intelligence, 6001, Berlin, Springer, pp.75-85.
Contact us
Contact us using the following email address: 'nlxgroup' concatenated with 'at' concatenated with 'di.fc.ul.pt'.
Why LX-Sentence Splitter?
LX because LX is the "code" name Lisboners like to use to refer to their hometown.
License
No fee, attribution, all rights reserved, no redistribution, non commercial, no warranty, no liability, no endorsement, temporary, non exclusive, share alike.
The complete text of this license is here.