Now it’s really time to leave and to head home! So, bye, bye, Day of DH… see you next year.
How to differentiate novelistic subgenres? In my dissertation, I want to find out how different types of novels cluster together when they are analyzed in terms of their topics and other textual and stylistic features. As a starting point, I try to identify what feature characteristics are typical for a certain kind of subgenre. One such feature are temporal expressions. Do dates, for example, occur more often in historical novels compared to other types of texts?
HeidelTime (@ Github) is a “multilingual and cross-domain temporal tagger” developed at the Heidelberg University and I would like to test it with the novels I am interested in. Luckily, there is an online demo of HeidelTime, so the first test is to run the demo with two text files: one is the text of a historical novel, the other one of a non-historical novel.The output are nice XML (TimeML) files. Here is a snippet of the historical novel El falso Inca by Roberto Payró:
<TimeML> Dos viajeros, un hombre y una mujer, indígenas a juzgar por su
aspecto y traje, cruzaban al caer <TIMEX3 tid="t4" type="TIME"
value="XXXX-XX-XXTAF">la tarde</TIMEX3> de <TIMEX3 tid="t5"
type="DURATION" value="P1D">un tibio día</TIMEX3> de <TIMEX3 tid="t3"
type="DATE" value="1656-05">mayo de 1656</TIMEX3>, el amplio valle de
The second novel is La navidad en las montañas by Ignacio Manuel Altamirano.
I make a test with this XPath expression: //TIMEX3[@type=”DATE”]. How many temporal expressions of the type “date” are there?
- 69 in the historical novel
- 68 in the non-historical novel
What kind of temporal expressions are those? Among others, I see: “este día”, “pronto”, “hoy”, “ahora”…
Next try: //TIMEX3[@type=”DATE”][matches(@value,”\d+”)]. How many temporal expressions of the type “date” are there whose value attribute contains digits, i. e., that could be identified as a specific date?
- 23 in the historical novel
- 4 in the non-historical novel
That looks interesting! Apparently, the details matter.
As a next step, I would like to set up a HeidelTime workflow on my own computer. I hope that I can find out how to do that.
time for chocolate…
& time for a selfie to document this year’s outward appearance
… on the CLiGS blog: OCRing Spanish American texts
The “Book of the Dead” was a research project of the North Rhine-Westphalian Academy of Sciences, Humanities and the Arts, concerned with the collection of metadata and images of objects (mainly papyri) bearing spells which were part of the Egyptian book of the dead.
From 2011 to 2012 a digital project was brought into being at the Cologne Center for eHumanities, accompanying the main project during its remaining life span. Now it is the year 2016!
Both projects have ended officially, people have moved on, but the website, technical system behind it and contents are still there. And people are still interested in the site, now and then egyptologists from around the world write an email and ask for small updates of the content.
Who cares about the “remains”? (A typical question for DH projects?)
During the last half our, I participated in a meeting discussing that question. My warning bells rang: “I cannot, may not, should not invest any time here” (and I will not in the near future because I am doing my PhD in Würzburg). But as I was responsible for the programming back then, the least I can do is to be available for questions about the system which a new staff member of the Center has who tries to make overdue updates…
By the way, this same meeting has been documented by Patrick Sahle in this post: http://dayofdh2016.linhd.es/patrick/2016/04/08/meetings/
… with the DH people from Cologne at the locally famous “Pizzabude”…
Who do you recognize?
And what happened before the coffee break? See Patrick Sahle’s post on the Mensa Express!
Today, I received three emails from the University of Würzburg and am now officially enrolled as a PhD student! Yay! It is fascinating that there was almost no real paperwork (the one which has to do with the raw material made of fibers) and no physical visit of administrative buildings involved. There is an electronic system where one can apply, and now I am following the instructions I got via email to finish the enrollment – a video tutorial.
The Institute for Documentology and Scholarly Editing (IDE) is a group of DH researchers who are especially interested in the application of digital methods to historical documents. The institute exists since 2006. Among other activities, it organizes schools on methods and technologies for the creation of digital scholarly editions. The next summer school will be in Graz in September. Check out the IDE website, if you are interested!
I am a member of the IDE since 2011. Who else is around here today?
Where are all the others???
Here again, after participating last year and in 2012. Last year I was in the office in Würzburg, today I am in Cologne. I work at the University of Würzburg as a member of the CLiGS group (see our group’s blog, as well, where my colleague José Calvo already did some posting!). I left the CCeH officially last year at the end of July which still feels strange after having been there for four years. But as I continue to live in Cologne (my 2-person-family is based here), when not staying in the cosy little room in Gerbrunn, my former colleagues still tolerate me in their ‘DH castle’ when I am here. Thank you! The room is another one, but the plants are still there, just like in 2012. One is new, the orchid.