Testing HeidelTime

Cagnacci_Allegoria_snippet
Allegoria della vita umana, Guido Cagnacci

How to differentiate novelistic subgenres? In my dissertation, I want to find out how different types of novels cluster together when they are analyzed in terms of their topics and other textual and stylistic features. As a starting point, I try to identify what feature characteristics are typical for a certain kind of subgenre. One such feature are temporal expressions. Do dates, for example, occur more often in historical novels compared to other types of texts?

HeidelTime (@ Github) is a “multilingual and cross-domain temporal tagger” developed at the Heidelberg University and I would like to test it with the novels I am interested in. Luckily, there is an online demo of HeidelTime, so the first test is to run the demo with two text files: one is the text of a historical novel, the other one of a non-historical novel.heideldemoThe output are nice XML (TimeML) files. Here is a snippet of the historical novel El falso Inca by Roberto Payró:

<TimeML> Dos viajeros, un hombre y una mujer, indígenas a juzgar por su
aspecto y traje, cruzaban al caer <TIMEX3 tid="t4" type="TIME"
value="XXXX-XX-XXTAF">la tarde</TIMEX3> de <TIMEX3 tid="t5"
type="DURATION" value="P1D">un tibio día</TIMEX3> de <TIMEX3 tid="t3"
type="DATE" value="1656-05">mayo de 1656</TIMEX3>, el amplio valle de
Catamarca...

The second novel is La navidad en las montañas by Ignacio Manuel Altamirano.

I make a test with this XPath expression: //TIMEX3[@type=”DATE”]. How many temporal expressions of the type “date” are there?

  • 69 in the historical novel
  • 68 in the non-historical novel

Hm…

What kind of temporal expressions are those? Among others, I see: “este día”, “pronto”, “hoy”, “ahora”…

Next try: //TIMEX3[@type=”DATE”][matches(@value,”\d+”)]. How many temporal expressions of the type “date” are there whose value attribute contains digits, i. e., that could be identified as a specific date?

  • 23 in the historical novel
  • 4 in the non-historical novel

That looks interesting! Apparently, the details matter.

As a next step, I would like to set up a HeidelTime workflow on my own computer. I hope that I can find out how to do that.

Leave a Reply

Your email address will not be published. Required fields are marked *