TimeBankPT, a TimeML annotated corpus of Portuguese, is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations).
The annotation scheme used is similar to TimeML. TimeBankPT is the result of adapting the English corpus used in the first TempEval challenge to the Portuguese language.
TimeML is a rich annotation scheme in so far as it allows for the annotation of several phenomena related to time: the times, dates and periods denoted by temporal expressions, events, temporal relations, etc.
Some of the features of TimeBankPT:
- It uses the new Portuguese spelling (official document describing it, Wikipedia article).
- It was automatically checked for errors using reasoning code.
- It contains around 70,000 words of text, divided in a train set and a test set.
- It contains annotations for events, temporal expressions and temporal relations.