The interface of CINTIL-Treebank Online Searcher is simple.
(1) To help you, we have examples of 3 different levels of difficulty: simple,
complex and advanced.
(2) There is a text box where you must type the syntactic pattern you want to
search.
(3) You can mark the option to the show the POS tag in the trees.
(4) You can choose the number of results returned (between 1 and 20
sentences).
(5) Once the search results are returned, use the navigation buttons and
arrows to search for the next results.
(6) To view the tree just place the cursor on the sentence you want and
click.
(7) The syntactic tree corresponding to the sentence will appear below.
(8) The dependency tree corresponding to the sentence will appear beneath the
syntactic tree.
Searching by linguistic tags
To start the search by linguistic tags, you must know the tags and syntax for
searching.
The tagsets used in the annotation of CINTIL-Treebank are available for quick reference
under the tab "Tagsets"
at the top of this panel.
The table below presents the syntax and symbols used for searching in the
CINTIL-Treebank. In the search by linguistic tags, tags should always be
capitalized.
Expression
Meaning
Example
A << B
A dominates B
NP << N
A >> B
A is dominated by B
V >> VP
A < B
A immediately dominates B
PP < P
A > B
A is immediately dominated by B
CONJ > NP
A $ B
A is a sister of B
NP $ CONJ
A .. B
A precedes B
P .. POSS-M
A . B
A immediately precedes B
CONJ . VP
A ,, B
A follows B
CARD ,, VP
A , B
A immediately follows B
D-SP , NP-C
A <<, B
B is a leftmost descendent of A
VP <<, P
A <<;- B
B is a rightmost descendent of A
PP <<;- N
A >>, B
A is a leftmost descendent of B
ADV >>, S
A >>;- B
A is a rightmost descendent of B
S >>;- VP
A <, B
B is the first child of A
PP <, P
A >, B
A is the first child of B
V >, VP
A <- B
B is the last child of A
PP <- NP-C
A >- B
A is the last child of B
CARD >- D-SP
A <i B
B is the ith-to-last child of A
NP-C <1 D-SP
A >i B
A is the ith-to-last child of B
ADV >1 ADVP
A <: B
B is the only child of A
NP-C <: N
A >: B
A is the only child of A
N >: NP
A <<# B
B is a head of phrase A
D-SP <<# CARD
A <# B
B is the immediate head of phrase A
NP <# N
@A
All tags that have string A
@NP
Searching by regular expressions
It is possible to search with regular expressions. The usual notational
conventions are followed:
Alternation
Alternatives are introduced by the | (vertical bar) character:
|
NP|VP matches all parse trees with a noun phrase and
all parse trees with a verbal phrase.
Iteration
There are three forms of expressing iteration.
The .* (final mark + star) operators permit that the
character/expression preceding it is matched zero or more times,
provided it is enclosed in bars /:
/NP.*/ matches any parse tree with tag NP, for
example: NP, NP-C, NP-M e NP-SJ.
Delimiters
To delimit the beginning and end of a tag, you can use special
characters ^ e $. This type of search is useful when
you want to find parse trees with a composition of semantic roles and
grammatical tags, provided it is enclosed in bars /:
/^NP.*.ARG1$/ matches any parse tree with beginning
with tag NP, with any tag in the middle, but ending with tag
ARG1, which indicates the semantic role of first argument, for
example: NP-DO-ARG1 e NP-SJ-ARG1.
Searching by words
The search can also be performed in leaves of trees where the words
occur. To find any word, type it in the text box. For example:
Portugal
Click the button "Search" and all sentences where the word exists will be
shown below: The search by words depends upon their spelling in the
treebank. The word can be written both in upper or lower case.
To improve the search we can try words with different spellings. For example:
Portugal|portugal|PORTUGAL
Searching by sentence identifier
All sentences in the CINTIL-Treebank have a unique number identifier. The
identifier is shown when the sentence is returned on the screen.
The user can use this number to directly find sentences in the
CINTIL-Treebank. In order to search for a sentence using its number
identifier the user must make a note of the corresponding returned with the
sentence. The search uses the pattern "ID:". For example, ID:a102
will select the sentence with identifier 102 in the
CINTIL-Treebank. To visualize the parse tree just click on the sentence.
Search non-matching trees
The CINTIL-Treebank Searcher provides an option to find parse trees
that don't have a determined pattern. To use this search option, it is
required to use the word "INV", following the colon ":". Thus, the
parse trees where the pattern is not found are return as a result.
For example, INV:VP will select all sentences that do not
have verbal phrases as a result.
To visualize the parse tree just click on the sentence.
The CINTIL-Treebank Online Searcher allows the use of generic structural
patterns of the syntactic trees in order to find those trees in the treebank
that conform to these patterns. This service is a robust search tool that
finds linguistic structures of great complexity.
The annotation of CINTIL-Treebank is performed according to the method of
annotation presented in the literature as that which ensures the highest
reliability in the results obtained: multiple independent annotation,
followed by adjudication. Each sentence is automatically analysed by LXGram, a grammar for the computational
processing of Portuguese. Once a grammatical analysis is obtained, two
independent annotators choose the analysis they each consider to be correct.
In case of divergence between annotators, an adjudicator reviews their
decisions and makes the final choice. The annotators and adjudicators
are specialists with post-graduate degrees in Linguistics.
The CINTIL-Treebank is currently under development. At present it is composed
of 35,499 sentences. The treebank is composed of sentences taken from the CINTIL-International Corpus of Portuguese and sentences of
the regression corpus of the grammar LXGram.
Acquiring CINTIL-Treebank
CINTIL-Treebank is distributed through PORTULAN CLARIN, and may be found
here.
Authorship
CINTIL-Treebank Online Searcher was developed by Patrícia Gonçalves and
managed by António Branco, at the NLX-Natural Language and Speech Group, partly in the scope of the
SemanticShare Project, funded by FCT-Fundação para a Ciência e Tecnologia.
Publications
Irrespective of the most recent version of this tool you may use,
when mentioning it, please cite this reference:
The annotation of CINTIL-Treebank is performed according to the method of
annotation presented in the literature as that which ensures the highest
reliability in the results obtained: multiple independent annotation,
followed by adjudication. The parse tree chosen in the annotation was
produced by LXGram, a grammar for the computational
processing of Portuguese. It is being developed under the following major
design features:
precision:
it is a precision grammar delivering accurate, linguistically grounded
information of natural language sentences
deep processing:
it is a grammar for deep linguistic processing in as much as besides
information on the major syntactic dimensions of grammatical
constituency and dependency, it delivers (and generates from)
fully-fledged logical representation of the meaning of natural language
sentences
large-scale:
it is planned not to leave out any sort of regular grammatical
construction or phenomena.
multi-purpose:
it is intended to make available as much linguistic information as it
can possible be made explicit by automatic means, given the current
state of the art in language technology, with the goal of offering
itself to support the largest possible range of language technology
applications.
Annotation guidelines
The treebank was designed along the principles described in the following
handbooks:
The interface of CINTIL-Treebank Online Searcher is simple.
(1) To help you, we have examples of 3 different levels of difficulty: simple,
complex and advanced.
(2) There is a text box where you must type the syntactic pattern you want to
search.
(3) You can mark the option to the show the POS tag in the trees.
(4) You can choose the number of results returned (between 1 and 20
sentences).
(5) Once the search results are returned, use the navigation buttons and
arrows to search for the next results.
(6) To view the tree just place the cursor on the sentence you want and
click.
(7) The syntactic tree corresponding to the sentence will appear below.
(8) The dependency tree corresponding to the sentence will appear beneath the
syntactic tree.
Searching by linguistic tags
To start the search by linguistic tags, you must know the tags and syntax for
searching.
The tagsets used in the annotation of CINTIL-Treebank are available for quick reference
under the tab "Tagsets"
of the panel that
appears by clicking the
button on the interface.
The table below presents the syntax and symbols used for searching in the
CINTIL-Treebank. In the search by linguistic tags, tags should always be
capitalized.
Expression
Meaning
Example
A << B
A dominates B
NP << N
A >> B
A is dominated by B
V >> VP
A < B
A immediately dominates B
PP < P
A > B
A is immediately dominated by B
CONJ > NP
A $ B
A is a sister of B
NP $ CONJ
A .. B
A precedes B
P .. POSS-M
A . B
A immediately precedes B
CONJ . VP
A ,, B
A follows B
CARD ,, VP
A , B
A immediately follows B
D-SP , NP-C
A <<, B
B is a leftmost descendent of A
VP <<, P
A <<;- B
B is a rightmost descendent of A
PP <<;- N
A >>, B
A is a leftmost descendent of B
ADV >>, S
A >>;- B
A is a rightmost descendent of B
S >>;- VP
A <, B
B is the first child of A
PP <, P
A >, B
A is the first child of B
V >, VP
A <- B
B is the last child of A
PP <- NP-C
A >- B
A is the last child of B
CARD >- D-SP
A <i B
B is the ith-to-last child of A
NP-C <1 D-SP
A >i B
A is the ith-to-last child of B
ADV >1 ADVP
A <: B
B is the only child of A
NP-C <: N
A >: B
A is the only child of A
N >: NP
A <<# B
B is a head of phrase A
D-SP <<# CARD
A <# B
B is the immediate head of phrase A
NP <# N
@A
All tags that have string A
@NP
Searching by regular expressions
It is possible to search with regular expressions. The usual notational
conventions are followed:
Alternation
Alternatives are introduced by the | (vertical bar) character:
|
NP|VP matches all parse trees with a noun phrase and
all parse trees with a verbal phrase.
Iteration
There are three forms of expressing iteration.
The .* (final mark + star) operators permit that the
character/expression preceding it is matched zero or more times,
provided it is enclosed in bars /:
/NP.*/ matches any parse tree with tag NP, for
example: NP, NP-C, NP-M e NP-SJ.
Delimiters
To delimit the beginning and end of a tag, you can use special
characters ^ e $. This type of search is useful when
you want to find parse trees with a composition of semantic roles and
grammatical tags, provided it is enclosed in bars /:
/^NP.*.ARG1$/ matches any parse tree with beginning
with tag NP, with any tag in the middle, but ending with tag
ARG1, which indicates the semantic role of first argument, for
example: NP-DO-ARG1 e NP-SJ-ARG1.
Searching by words
The search can also be performed in leaves of trees where the words
occur. To find any word, type it in the text box. For example:
Portugal
Click the button "Search" and all sentences where the word exists will be
shown below: The search by words depends upon their spelling in the
treebank. The word can be written both in upper or lower case.
To improve the search we can try words with different spellings. For example:
Portugal|portugal|PORTUGAL
Searching by sentence identifier
All sentences in the CINTIL-Treebank have a unique number identifier. The
identifier is shown when the sentence is returned on the screen.
The user can use this number to directly find sentences in the
CINTIL-Treebank. In order to search for a sentence using its number
identifier the user must make a note of the corresponding returned with the
sentence. The search uses the pattern "ID:". For example, ID:a102
will select the sentence with identifier 102 in the
CINTIL-Treebank. To visualize the parse tree just click on the sentence.
Search non-matching trees
The CINTIL-Treebank Searcher provides an option to find parse trees
that don't have a determined pattern. To use this search option, it is
required to use the word "INV", following the colon ":". Thus, the
parse trees where the pattern is not found are return as a result.
For example, INV:VP will select all sentences that do not
have verbal phrases as a result.
To visualize the parse tree just click on the sentence.
License
No fee, attribution, all rights reserved, no redistribution, non commercial, no
warranty, no liability, no endorsement, temporary, non exclusive, share alike.