Home
Workbench
LX-Syllabifier

File processing

Input format: Input files must be in .txt FORMAT with UTF-8 ENCODING and contain PORTUGUESE TEXT. Input files and folders can also be compressed to the .zip format.

Privacy: The input file you upload and the respective output files will be automatically deleted from our computer after being processed and the result downloaded by you. No copies of your files will be retained after your use of this service.

Email address validation

The size of your input file is large and its processing may take some time.

To receive by email an URL from which to download your processed file, please copy the code displayed below into the field "Subject:" of an email message (with the message body empty) and send it to request@portulanclarin.net

To proceed, please send an email to request@portulanclarin.net with the following code in the "Subject" field:

To:	`request@portulanclarin.net`
Subject:

The communication with the server cannot be established. Please try again later.

We are sorry but an unexpected error has occurred. Please try again later.

The code has expired. Please click the button below to get a new code.

For enhanced security, a new code has to be validated. Please click the button below to get a new code.

Privacy: After we reply to you with the URL for download, your email address is automatically deleted from our records.

Designing your own experiment with a Jupyter Notebook

A Jupyter notebook (hereafter just notebook, for short) is a type of document that contains executable code interspersed with visualizations of code execution results and narrative text.

Below we provide an example notebook which you may use as a starting point for designing your own experiments using language resources offered by PORTULAN CLARIN.

Pre-requisites

To execute this notebook, you need an access key you can obtain by clicking the button below. A key is valid for 31 days. It allows to submit a total of 1 billion characters by means of requests with no more 10000 characters each. It allows to enter 100,000 requests, at a rate of no more than 200 requests per hour.

For other usage regimes, you should contact the helpdesk.

The input data sent to any PORTULAN CLARIN web service and the respective output will be automatically deleted from our computers after being processed. However, when running a notebook on an external service, such as the ones suggested below, you should take their data privacy policies into consideration.

Running the notebook

You have three options to run the notebook presented below:

Run on Binder — The Binder Project is funded by a 501c3 non-profit organization and is described in detail in the following paper:
Jupyter et al., "Binder 2.0 - Reproducible, Interactive, Sharable Environments for Science at Scale."
Proceedings of the 17th Python in Science Conference. 2018. doi://10.25080/Majora-4af1f417-011
Run on Google Colab — Google Colaboratory is a free-to-use product from Google Research.
Download the notebook from our public Github repository and run it on your computer.
This is a more advanced option, which requires you to install Python 3 and Jupyter on your computer. For anyone without prior experience setting up a Python development environment, we strongly recommend one of the two options above.

This is only a preview of the notebook. To run it, please choose one of the following options:

Run on Binder Run on Google Colab Download from Github

Using LX-Syllabifier to syllabify all words in a text¶

This is an example notebook that illustrates how you can use the LX-Syllabifier web service to analyse a text.

Before you run this example, replace access_key_goes_here by your webservice access key, below:

In [1]:

LXSLLABIFIER_WS_API_KEY = 'access_key_goes_here'
LXSLLABIFIER_WS_API_URL = 'https://portulanclarin.net/workbench/lx-syllabifier/api/'

Importing required Python modules¶

The next cell will take care of installing the requests package, if not already installed, and make it available to use in this notebook.

In [2]:

try:
    import requests
except:
    !pip3 install requests
    import requests
from IPython.display import HTML, display_html

Wrapping the complexities of the JSON-RPC API in a simple, easy to use function¶

The WSException class defined below, will be used later to identify errors from the webservice.

In [3]:

class WSException(Exception):
    'Webservice Exception'
    def __init__(self, errordata):
        "errordata is a dict returned by the webservice with details about the error"
        super().__init__(self)
        assert isinstance(errordata, dict)
        self.message = errordata["message"]
        # see https://json-rpc.readthedocs.io/en/latest/exceptions.html for more info
        # about JSON-RPC error codes
        if -32099 <= errordata["code"] <= -32000:  # Server Error
            if errordata["data"]["type"] == "WebServiceException":
                self.message += f": {errordata['data']['message']}"
            else:
                self.message += f": {errordata['data']!r}"
    def __str__(self):
        return self.message

The next function invoques the LX-Suite webservice through it's public JSON-RPC API.

In [4]:

def syllabify(text):
    '''
    Arguments
        text: a string with a maximum of 10000 characters, Portuguese text, with
             the input to be processed

    Returns a string or JSON object with the output according to specification in
       https://portulanclarin.net/workbench/lx-syllabifier/
    
    Raises a WSException if an error occurs.
    '''

    request_data = {
        'method': 'syllabify',
        'jsonrpc': '2.0',
        'id': 0,
        'params': {
            'text': text,
            'key': LXSLLABIFIER_WS_API_KEY,
        },
    }
    request = requests.post(LXSLLABIFIER_WS_API_URL, json=request_data)
    response_data = request.json()
    if "error" in response_data:
        raise WSException(response_data["error"])
    else:
        return response_data["result"]

The next function will count the number of syllables in a given string (already processed by LX-Syllabifier):

In [5]:

def count_syllables(s):
    # this is a naive tokenization based on whitespace, but in principle it poses no problem
    # because punctuation will be attached to the previous token and that will not change the
    # number of syllables
    return sum(len(token.split("|")) for token in s.split(" "))

Here are a few stanzas from Luís de Camões' work "Os Lusíadas" that we will use in our experiment:

In [6]:

stanzas = ["""
As armas e os barões assinalados,
Que da ocidental praia Lusitana,
Por mares nunca de antes navegados,
Passaram ainda além da Taprobana,
Em perigos e guerras esforçados,
Mais do que prometia a força humana,
E entre gente remota edificaram
Novo Reino, que tanto sublimaram;
""","""
E também as memórias gloriosas
Daqueles Reis, que foram dilatando
A Fé, o Império, e as terras viciosas
De África e de Ásia andaram devastando;
E aqueles, que por obras valerosas
Se vão da lei da morte libertando;
Cantando espalharei por toda parte,
Se a tanto me ajudar o engenho e arte.
""","""
Cessem do sábio Grego e do Troiano
As navegações grandes que fizeram;
Cale-se de Alexandro e de Trajano
A fama das vitórias que tiveram;
Que eu canto o peito ilustre Lusitano,
A quem Neptuno e Marte obedeceram:
Cesse tudo o que a Musa antígua canta,
Que outro valor mais alto se alevanta.
""","""
E vós, Tágides minhas, pois criado
Tendes em mim um novo engenho ardente,
Se sempre em verso humilde celebrado
Foi de mim vosso rio alegremente,
Dai-me agora um som alto e sublimado,
Um estilo grandíloquo e corrente,
Porque de vossas águas, Febo ordene
Que não tenham inveja às de Hipoerene.
"""]

Next, we will use the functions we defined above for syllabifying an excerpt from Luís de Camões' work "Os Lusíadas", and to count the number of syllables in each line:

In [7]:

for stanza in stanzas:
    html = ['<div class="stanza">']
    syllabified = syllabify(stanza)
    html.extend([
        f'<div class="verse"><div class="count">{count_syllables(verse)}</div> {verse.replace("|", "·")}</div>'
        for verse in syllabified.strip().splitlines()
    ])
    html.append('</div>')
    display_html(HTML("".join(html)))
display_html(HTML("""<style>
div.stanza { margin: 5px; font-size: 110%; }
div.count { display: inline-block; text-align: center; width: 3em; background-color: #eee; }
</style>"""))

As ar·mas e os ba·rões as·si·na·la·dos,

Que da o·ci·den·tal prai·a Lu·si·ta·na,

Por ma·res nun·ca de an·tes na·ve·ga·dos,

Pas·sa·ram a·in·da a·lém da Ta·pro·ba·na,

Em pe·ri·gos e guer·ras es·for·ça·dos,

Mais do que pro·me·ti·a a for·ça hu·ma·na,

E en·tre gen·te re·mo·ta e·di·fi·ca·ram

No·vo Rei·no, que tan·to su·bli·ma·ram;

E tam·bém as me·mó·ri·as glo·ri·o·sas

Da·que·les Reis, que fo·ram di·la·tan·do

A Fé, o Im·pé·ri·o, e as ter·ras vi·ci·o·sas

De Á·fri·ca e de Á·si·a an·da·ram de·vas·tan·do;

E a·que·les, que por o·bras va·le·ro·sas

Se vão da lei da mor·te li·ber·tan·do;

Can·tan·do es·pa·lha·rei por to·da par·te,

Se a tan·to me a·ju·dar o en·ge·nho e ar·te.

Ces·sem do sá·bi·o Gre·go e do Troi·a·no

As na·ve·ga·ções gran·des que fi·ze·ram;

Ca·le-·se de A·le·xan·dro e de Tra·ja·no

A fa·ma das vi·tó·ri·as que ti·ve·ram;

Que eu can·to o pei·to i·lus·tre Lu·si·ta·no,

A quem Nep·tu·no e Mar·te o·be·de·ce·ram:

Ces·se tu·do o que a Mu·sa an·tí·gu·a can·ta,

Que ou·tro va·lor mais al·to se a·le·van·ta.

E vós, Tá·gi·des mi·nhas, pois cri·a·do

Ten·des em mim um no·vo en·ge·nho ar·den·te,

Se sem·pre em ver·so hu·mil·de ce·le·bra·do

Foi de mim vos·so ri·o a·le·gre·men·te,

Dai-·me a·go·ra um som al·to e su·bli·ma·do,

Um es·ti·lo gran·dí·lo·quo e cor·ren·te,

Por·que de vos·sas á·guas, Fe·bo or·de·ne

Que não te·nham in·ve·ja às de Hi·po·e·re·ne.

Getting the status of a webservice access key¶

In [8]:

def get_key_status():
    '''Returns a string with the detailed status of the webservice access key'''
    
    request_data = {
        'method': 'key_status',
        'jsonrpc': '2.0',
        'id': 0,
        'params': {
            'key': LXSLLABIFIER_WS_API_KEY,
        },
    }
    request = requests.post(LXSLLABIFIER_WS_API_URL, json=request_data)
    response_data = request.json()
    if "error" in response_data:
        raise WSException(response_data["error"])
    else:
        return response_data["result"]

In [9]:

get_key_status()

Out[9]:

{'requests_remaining': 99999894,
 'chars_remaining': 999946533,
 'expiry': '2030-01-10T00:00+00:00'}

Instructions to use this web service

The web service for this application is available at https://portulanclarin.net/workbench/lx-syllabifier/api/.

Below you find an example of how to use this web service with Python 3.

This example resorts to the requests package. To install this package, run this command in the command line: pip3 install requests.

To use this web service, you need an access key you can obtain by clicking in the button below. A key is valid for 31 days. It allows to submit a total of 1 billion characters by means of requests with no more 10000 characters each. It allows to enter 100,000 requests, at a rate of no more than 200 requests per hour.

For other usage regimes, you should contact the helpdesk.

The input data and the respective output will be automatically deleted from our computer after being processed. No copies will be retained after your use of this service.

import json
import requests  # to install this library, enter in your command line:
                 #  pip3 install requests

# This is a simple example to illustrate how you can use the LX-Syllabifier web service

# Requires: key is a string with your access key
# Requires: text is a string, UTF-8, with a maximum 10000 characters, Portuguese text, with
#           the input to be processed

# Ensures: output according to specification in https://portulanclarin.net/workbench/lx-syllabifier/
# Ensures: dict with number of requests and characters input so far with the access key, and
#          its date of expiry

key = 'access_key_goes_here' # before you run this example, replace access_key_goes_here by
                             # your access key

# this string can be replaced by your input
text = '''As armas e os barões assinalados,
Que da ocidental praia Lusitana,
Por mares nunca de antes navegados,
Passaram ainda além da Taprobana,
Em perigos e guerras esforçados,
Mais do que prometia a força humana,
E entre gente remota edificaram
Novo Reino, que tanto sublimaram;
'''

# To read input text from a file, uncomment this block
#inputFile = open("myInputFileName", "r", encoding="utf-8") # replace myInputFileName by
                                                            # the name of your file
#text = inputFile.read()
#inputFile.close()


# Processing:

url = "https://portulanclarin.net/workbench/lx-syllabifier/api/"
request_data = {
    'method': 'syllabify',
    'jsonrpc': '2.0',
    'id': 0,
    'params': {
        'text': text,
        'key': key,
    },
}
request = requests.post(url, json=request_data)
response_data = request.json()
if "error" in response_data:
    print("Error:", response_data["error"])
else:
    print("Result:")
    print(response_data["result"])


# To write output in a file, uncomment this block
#outputFile = open("myOutputFileName","w", encoding="utf-8") # replace myOutputFileName by
                                                             # the name of your file
#output = response_data["result"]
#outputFile.write(output)
#outputFile.close()


# Getting acess key status:

request_data = {
    'method': 'key_status',
    'jsonrpc': '2.0',
    'id': 0,
    'params': {
        'key': key,
    },
}
request = requests.post(url, json=request_data)
response_data = request.json()
if "error" in response_data:
    print("Error:", response_data["error"])
else:
    print("Key status:")
    print(json.dumps(response_data["result"], indent=4))

Access key for the web service

This is your access key for this web service.

The following access key for this web service is already associated with .

This key is valid until and can be used to process requests or characters.

An email message has been sent into your address with the information above.

Email address validation

To receive by email your access key for this webservice, please copy the code displayed below into the field "Subject" of an email message (with the message body empty) and send it to request@portulanclarin.net

To proceed, please send an email to request@portulanclarin.net with the following code in the "Subject" field:

To:	`request@portulanclarin.net`
Subject:

The communication with the server cannot be established. Please try again later.

We are sorry but an unexpected error has occurred. Please try again later.

The code has expired. Please click the button below to get a new code.

For enhanced security, a new code has to be validated. Please click the button below to get a new code.

Privacy: When your access key expires, your email address is automatically deleted from our records.

LX-Syllabifier's documentation

Functionality

LX-Syllabifier is a freely available on-line service for the syllabification of Portuguese text. This service was developed and is maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

LX-Syllabifier performs syllabification following a rule based approach. The syllabification of Portuguese is defined in Acordo Ortográfico da Língua Portuguesa (1990).

Authorship

LX-Syllabifier was developed by Francisco Costa, João Rodrigues and João Silva, under the coordination of António Branco, at the NLX-Natural Language and Speech Group.

License

LX-Syllabifier is distributed under an MIT license.

Release

You can download the program here.

Publications

Irrespective of the most recent version of this tool you may use, when mentioning it, please cite this reference:

Rodrigues, João, Francisco Costa, João Silva and António Branco, 2016, "Automatic Syllabification of Portuguese", 1 (10), Revista da Associação Portuguesa de Linguística, Associação Portuguesa de Linguística e Faculdade de Letras do Porto, pp. 715-720.

Contact us

License

The complete text of this license is here.