How to differentiate novelistic subgenres? In my dissertation, I want to find out how different types of novels cluster together when they are analyzed in terms of their topics and other textual and stylistic features. As a starting point, I try to identify what feature characteristics are typical for a certain kind of subgenre. One such feature are temporal expressions. Do dates, for example, occur more often in historical novels compared to other types of texts?
HeidelTime (@ Github) is a “multilingual and cross-domain temporal tagger” developed at the Heidelberg University and I would like to test it with the novels I am interested in. Luckily, there is an online demo of HeidelTime, so the first test is to run the demo with two text files: one is the text of a historical novel, the other one of a non-historical novel.The output are nice XML (TimeML) files. Here is a snippet of the historical novel El falso Inca by Roberto Payró:
<TimeML> Dos viajeros, un hombre y una mujer, indígenas a juzgar por su
aspecto y traje, cruzaban al caer <TIMEX3 tid="t4" type="TIME"
value="XXXX-XX-XXTAF">la tarde</TIMEX3> de <TIMEX3 tid="t5"
type="DURATION" value="P1D">un tibio día</TIMEX3> de <TIMEX3 tid="t3"
type="DATE" value="1656-05">mayo de 1656</TIMEX3>, el amplio valle de
The second novel is La navidad en las montañas by Ignacio Manuel Altamirano.
I make a test with this XPath expression: //TIMEX3[@type=”DATE”]. How many temporal expressions of the type “date” are there?
- 69 in the historical novel
- 68 in the non-historical novel
What kind of temporal expressions are those? Among others, I see: “este día”, “pronto”, “hoy”, “ahora”…
Next try: //TIMEX3[@type=”DATE”][matches(@value,”\d+”)]. How many temporal expressions of the type “date” are there whose value attribute contains digits, i. e., that could be identified as a specific date?
- 23 in the historical novel
- 4 in the non-historical novel
That looks interesting! Apparently, the details matter.
As a next step, I would like to set up a HeidelTime workflow on my own computer. I hope that I can find out how to do that.