«The Memórias Paroquiais (Parish Memories) are an essential source for obtaining a radiography of Portugal in 1758-1761. They correspond to a survey, organized in 3 major parts (the locality itself, the mountain and the river), which was printed and sent to those responsible for the dioceses of t...
The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...
The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...
SENTER is a SENtence splitTER for Portuguese.
TimeBankPT, a TimeML annotated corpus of Portuguese, is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). The annotation scheme used is similar to TimeML. TimeBankPT is the...
CINTIL-QATreebank is a treebank composed of Portuguese sentences that can be used to support the development of Question Answering systems. This Treebank includes 111 declarative sentences from the pre-existing CINTIL-Treebank (see Branco et al. 2011) whose syntactic structure was manually transf...
The CINTIL-LogicalFormBank (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of sentences from Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082...
The CINTIL-PropBank (Branco et al., 2012) is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
ViPER is a verb lexical database with +7,000 verb senses, along with their structural, distributional, and transformational properties. The verb senses are classified based on the main syntactic properties of their construction. Around 70 formal classes have been devised. For each verb sense, its...
LX-Tagger is a freely available online service for the part-of-speech tagging of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. The service is composed by a set of shallow processing tools: A se...