
File processing
Input format: Input files must be in .txt FORMAT with UTF-8 ENCODING and contain PORTUGUESE TEXT. Input files and folders can also be compressed to the .zip format.
Privacy: The input file you upload and the respective output files will be automatically deleted from our computer after being processed and the result downloaded by you. No copies of your files will be retained after your use of this service.
The size of your input file is large and its processing may take some time. To receive by email the url link from which to download your processed file when ready, enter your email address below. After being used for this purpose, your email address will be deleted from our computer.
Instructions to use this web service
The web service for this application is available at https://portulanclarin.net/workbench/lx-parser/api/.
Below you find an example of how to use this web service with Python 3.
This example resorts to the requests package. To install this package, run this command in the command line:
pip3 install requests
.
To use this web service, you need an access key you can obtain by clicking in the button below. A key is valid for 31 days. It allows to submit a total of 10 million characters by means of requests with no more 2500 characters each. It allows to enter 100,000 requests, at a rate of no more than 200 requests per hour.
For other usage regimes, you should contact the helpdesk.
The input data and the respective output will be automatically deleted from our computer after being processed. No copies will be retained after your use of this service.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | import json import requests # to install this library, enter in your command line: # pip3 install requests # This is a simple example to illustrate how you can use the LX-Parser web service # Requires: key is a string with your access key # Requires: text is a string, UTF-8, with a maximum 2500 characters, Portuguese text, with # the input to be processed # Requires: format is a string, indicating the output format, which can be either # 'parentheses', 'table' or 'JSON' # Ensures: output according to specification in https://portulanclarin.net/workbench/lx-parser/ # Ensures: dict with number of requests and characters input so far with the access key, and # its date of expiry key = 'access_key_goes_here' # before you run this example, replace access_key_goes_here by # your access key format = 'parentheses' # other possible values are 'table' and 'JSON' # this string can be replaced by your input text = '''A Praça Luís de Camões será embelezada. Já passámos a fase das grandes produções com muitos violinos e orquestras.''' # To read input text from a file, uncomment this block #inputFile = open("myInputFileName", "r", encoding="utf-8") # replace myInputFileName by # the name of your file #text = inputFile.read() #inputFile.close() # Processing: url = "https://portulanclarin.net/workbench/lx-parser/api/" request_data = { 'method': 'parse', 'jsonrpc': '2.0', 'id': 0, 'params': { 'text': text, 'format': format, 'key': key, }, } request = requests.post(url, json=request_data) response_data = request.json() if "error" in response_data: print("Error:", response_data["error"]) else: print("Result:") print(response_data["result"]) # To write output in a file, uncomment this block #outputFile = open("myOutputFileName","w", encoding="utf-8") # replace myOutputFileName by # the name of your file #output = response_data["result"] #outputFile.write(output) #outputFile.close() # Getting acess key status: request_data = { 'method': 'key_status', 'jsonrpc': '2.0', 'id': 0, 'params': { 'key': key, }, } request = requests.post(url, json=request_data) response_data = request.json() if "error" in response_data: print("Error:", response_data["error"]) else: print("Key status:") print(json.dumps(response_data["result"], indent=4)) |
Access key for the web service
This is your access key for this web service.
The following access key for this web service is already associated with .
This key is valid until and can be used to process requests or characters.
Make sure to save this key before closing this dialog box.
Tag | Category |
---|---|
A | Adjective |
AP | Adjective Phrase |
ADV | Adverb |
ADVP | Adverb Phrase |
C | Complementizer |
CL | Clitics |
CP | Complementizer Phrase |
CARD | Cardinal |
CONJ | Conjuction |
CONJP | Conjuction Phrase |
D | Determiner |
DEM | Demonstrative |
N | Noun |
NP | Noun Phrase |
O | Ordinals |
P | Preposition |
PP | Preposition Phrase |
PPA | Past Participles/Adjectives |
POSS | Possessive |
PRS | Personals |
QNT | Predeterminer |
REL | Relatives |
S | Sentence |
V | Verb |
VP | Verb Phrase |
LX-Parser's documentation
LX-Parser
LX-Parser is a freely available on-line service for constituency parsing of Portuguese sentences. This service was developed and is maintained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.
LX-Parser performs a syntactic analysis of Portuguese sentences in terms of their constituency structure.
Supporting parser
LX-Parser is supported by the Stanford Parser. The parser developed by the Stanford
University is a statistical parser that is trained over a previously
annotated corpus.
A total of 22,118 sentences from
CINTIL-Treebank were used for training. This treebank
is being developed and maintained at the University of Lisbon by the NLX-Natural Language and
Speech Group of the Department of Informatics.
The parser uses probabilistic grammars. Under the Parseval metric it achieves an f-score of 89% (value obtained through 10-fold cross-evaluation).
Annotation guidelines
The syntactic analyses produced by LX-Parser are similar to the analyses found in the treebank on which LX-Parser was trained. This treebank was designed along the principles described in the following handbook:
- Branco António, João Silva, Francisco Costa, Sérgio Castro, 2011, CINTIL TreeBank Handbook: Design options for the representation of syntactic constituency. Department of Informatics, University of Lisbon, Technical Reports series, nb. di-fcul-tp-11-02.
Authorship
LX-Parser was developed by Patricia Gonçalves and João Silva, managed by António Branco, at the NLX-Natural Language and Speech Group, partly in the scope of the SemanticShare Project, funded by FCT-Fundação para a Ciência e Tecnologia.
Publications
Irrespective of the most recent version of this tool you may use, when mentioning it, please cite this reference:
- Silva, João, António Branco, Sérgio Castro and Ruben Reis, 2010, "Out-of-the-Box Robust Parsing of Portuguese". In Proceedings of the 9th International Conference on the Computational Processing of Portuguese (PROPOR2010), Lecture Notes in Artificial Intelligence, 6001, Berlin, Springer, pp.75–85.
Contact us
Contact us using the following email address: 'nlx' concatenated with 'at' concatenated with 'di.fc.ul.pt'.
Acknowledgments
This work was partly supported by FCT-Fundation of Science and Technology under the grant FCT/PTDC/PLP/81157/2006 for project SemanticShare. The system uses the PHPSyntaxTree Visualizer and the Stanford Parser.
Release
LX-Parser is made available as a standalone parser that you can download and run locally in your computer.
License
LX-Parser is distributed under an MIT license.
Required download
- The parser model file, cintil.ser.gz.
- Stanford Parser (requires Java 5 or later). Note that the model was created with version 1.6.5 of the parser. More recent versions of the software seem to be unable to load the model.
- LX-Tokenizer to tokenize input prior to parsing.
Instructions
Example command line:
java -Xmx500m -cp /path/to/stanford-parser.jar \ edu.stanford.nlp.parser.lexparser.LexicalizedParser \ -tokenized -sentences newline -outputFormat oneline \ -uwModel edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel \ cintil.ser.gz input.txt
A quick explanation of the options:
- For some more complex sentences, the default heap size used by Java
might not be enough. We increase the maximum heap size to 500 megabytes
with the
-Xmx500m
option. - The path to the Stanford Parser JAR file is provided with the
-cp
option. - The name of the Java class we wish to run (
LexicalizedParser
). - The input to the parser must already be tokenized (see LX-Tokenizer for details on tokenization
decisions). We indicate this through the
-tokenized
option. - Each sentence in the input is separated by newline. We indicate this
through the
-sentences newline
option. - The output format is one parse per line. NB: The parser always adds a ROOT node. You can remove it in a post-processing step.
- A class (
BaseUnknownWordModel
, part of the Stanford parser package) that implements a baseline word model is used to handle unknonwn words. It is chosen by the-uwModel
option. - The final two arguments are the model file and the input file.
Tagset
Tag | Category |
---|---|
A | Adjective |
AP | Adjective Phrase |
ADV | Adverb |
ADVP | Adverb Phrase |
C | Complementizer |
CL | Clitics |
CP | Complementizer Phrase |
CARD | Cardinal |
CONJ | Conjuction |
CONJP | Conjuction Phrase |
D | Determiner |
DEM | Demonstrative |
N | Noun |
NP | Noun Phrase |
O | Ordinals |
P | Preposition |
PP | Preposition Phrase |
PPA | Past Participles/Adjectives |
POSS | Possessive |
PRS | Personals |
QNT | Predeterminer |
REL | Relatives |
S | Sentence |
V | Verb |
VP | Verb Phrase |
Why LX-Parser?
LX because LX is the shorthand form Lisboners often use to refer to their hometown.
License
No fee, attribution, all rights reserved, no redistribution, non commercial, no warranty, no liability, no endorsement, temporary, non exclusive, share alike.
The complete text of this license is here.