OCRing Spanish American texts

(by Ulrike Henny)

Like my colleague José Calvo, I have spent much time with the collection of digital texts during the last year which I want to use in my PhD project. I am establishing a corpus of Spanish American novels of the 19th century. The texts date from 1830 to 1910 and I am including novels from Argentina, Mexico and Cuba.

The texts I found in HTML or clean plain text or as good ebooks were not so many as I had wished for. Because of that, I am now in a phase where I have to prepare the full text myself with the help of OCR. Fortunately, there are many PDF and image files online that I can use. I want to avoid doing the scanning on my own as much as possible.

Last chapter of the novel “El casamiento original” by the Cuban-Spanish author Felicia Auber Noya from 1844.

I feel that I have to improve my bibliographic search skills because I have the impression that it is not too easy to access lesser known Spanish American novels from Europe (even the physical copies through the library system).

If there is anybody out there who knows about collections of Spanish and Spanish American novels available in digital format and not easily findable which we could use or reuse, we would definitely be very happy.

And if you yourself are looking for this kind of texts, the CLiGS group has already published the first part of the novels which have been prepared in TEI as a “textbox” on Github and you are free to use them.

Leave a Reply

Your email address will not be published. Required fields are marked *