Stylesheets, modularity, soup, and tv

So, reader, I came home. When leaving work shortly after 6:30pm, I sent a glympse (Android App) to my lovely wife (who had only left work 30 mins before me) and unprompted she very nicely had a large cup of earl grey rooibos tea and a couple biscuits freshly waiting for me as I got through the door.  We made some yummy soup for supper, and then while catching up with various recorded televisual programmes, I got out my laptop to work on some TEI to PDF conversions for the project from which other work had distracted me during the day.

Now the TEI Consortium has standard XSLT Stylesheets for the conversion of TEI files to PDF.  These were mostly authored and maintained by Sebastian Rahtz who I mentioned in an earlier post. The project had requested that I change some things in these stylesheets, and I had already changed some of them, but was going through a list of requests that they had.  Some of these are easy, such as them noticing that linebreaks within <term> elements weren’t producing a linebreak in the output. Some of these are more difficult, like handling the amount of whitespace between front and back sections.

Why?  The TEIC Stylesheets are organised in a modular way across the different formats they can convert to.  Just glancing at the github repository you can see there are conversions to/from bibtex, cocoa, CSV, docbook, docx, DTD, EPub, EPub3, XSL-FO, HTML, HTML5, InDesign, json, latex, TEI Lite, markdown, mediawiki, NLM, TEI ODD, ODT,  TEI P4, PDF, RDF, RelaxNG, RNC, Slides, TEI Simple, TBX, TCP, TEI Tite, Text, WordPress, XLSX, and XSD.  Now the observant will notice that some of those are lossy formats (so you mostly likely go to them rather than from them) but some are also used for legacy conversion, and others produce schemas to validate TEI files.  This is because TEI files also can contain a TEI ODD customisation, a sort of literate programming meta-schema from which you can generate standard schema languages like Relax NG and XSD. (No one these days should be using DTDs, if you are working for a project which is, slap your technical person upside the head.)

That doesn’t really answer why it is complicated to make some changes and less complicated to make others.  Well, at the heart of it is this modularity of the TEI Stylesheets.  They are not only modular by format (sharing common aspects, but devolving format-specific aspects to those conversions), but also allow for individual profiles for different uses.  For example the profile for jTEI (the Journal of the TEI) has profile-specific conversion updates for odt, openedition, and pdf.  In my case I already had a profile for conversions to HTML for the project and was building up the one for PDF. The difficulty is then to track down where a specific transformation is made. Is it in your profile, the common stylesheets, or the stylesheets for that format? Moreover, the individual stylesheets are often modular, for example those for conversion to latex made up of 15 stylesheets based on handling different portions of the TEI Guidelines, but include another 12 common stylesheets which handle conditions that are common between these. Inside the stylesheets as well, they are fairly modular, using lots of <xsl:call-templates> to call named templates (which may be in a completely different file) to do complicated (or sometimes trivial but frequently repeated) processing.

In the individual profile for the project, it is easy to overwrite a template matching an XPath like “term/lb” (an <lb> element appearing in a <term>” and provide the latex equivalent of a linebreak.  However, this requires on us knowing or assuming that the way <term> is processed does so in a manner than takes account of other templates. (Usually, it does, which is why I was confused why it wasn’t working for this project.)  In other cases, the output for overall structure can be deeply nested inside multiple templates which are also determining the way the layout of the document as a whole functions. This is why changing that kind of thing can be difficult.  Why don’t we rewrite these to have more duplication but a clearer and more transparent system?  Well, the modular system works, Sebastian’s redesign of this means it is fairly easy for a project to have their own profile to do just the kinds of changes projects usually want. However, is just hard for others to maintain, also who has the time to do it? We tend to make changes as and when bugs are reported, or projects we’re working on expose how something could be better.

I spent a couple hours this evening working through some of the more complicated requests.  Some of them just work because Sebastian had anticipated people might want to do that kind of modification and so built a hook in to make it easy, and I spent some time banging my head against others where it would seem that major surgery was need to make it work.

Oh, it is 11:45pm, I’m starting to think it is time to get ready for bed. I’ll have to get back to this tomorrow.

1 Comment
  1. Conal 2 years ago

    I have always thought that pipelining is a superior method for XSLT modularization, frankly. It gives more ontological status to the stages, allowing validation etc. and providing strong encapsulation of each module, reducing the scope of these unwanted/accidental interdependencies. Then your conversions can be viewed at an overview level as various pathways through a graph.

Leave a reply

Your email address will not be published. Required fields are marked *



We're not around right now. But you can send us an email and we'll get back to you, asap.


©2018 KLEO Template a premium and multipurpose theme from Seventh Queen

Log in with your credentials

Forgot your details?

Skip to toolbar