Add to Watchlist

A Publisher's InDesign to BITS and EPUB Infrastructure: Conventions, Configuration, Conversion, Checks


Citation of segment
Embed Code
Purchasing a DVD Cite video

Automated Media Analysis

Recognized Entities
Speech transcript
now I'm going to and from an early this Publishing services and let's say in former eastern part of Germany and without publishing services vendor with for approximately 100 full-time employees so 1 of the largest ones that is still active in Germany and and I'm going to present a workflow that we have set up up for 1 of our customers organization and it's a publisher specialized in psychology you back to and they also have English language publications and a subsidiary here but they do all books so this is the clear clinical Handbook of psychotropic guards we did that and they take from DocBook XML so that I'm going to tell you something about bits and then in design so but just to mention and they they to enjoy those and they're doing tests and what's different but with books in when if you compare to Jones don't they tend to be less standardized so there's always the and pundits that has special wishes or you have extract they'll requirements or extra workflow requirements additional correction process etc. and and you somehow have 2 or adapt to to your content is set up a an XML production system for for books currently there are most of the widespread performance of word and and in design even in st and we have a strong background in math typesetting and inflated and stuff but even as the and publishers often require their their books to be typeset in in design and nowadays for books it's essential that you also have you can't output and a candidate output to and XML is increasingly important for of STM publishers as Tommy mentioned this morning that is their use cases for most of them not yet mean the top production is the 1st really use case where you can do probably without XML but the if it was it's is important that that they have um a clear compatible maximum so the EPA John multimedia fingers derived from some and any version and tend they already have produced the journal articles in a P B exploded for for EPA maximum and it should be mostly compatible it should be quality assurance so that it's not such a great job of and unused to tags and it it should be available simultaneously or before print out the electronic versions and and special requirement was that they wanted to include process that look up so they want to enhance small references with your eyes so and another requirement was to to use in design continue using in designed for layout and particularly to use the skills of the existing types set us come now there several approaches and I think they're more than these 3 that I tried to categorize for XML workflows oranges on example of this and what I called XML 1st all if 1 is that that you create tagging create XML imported into in design and then keep the tagging inside and kind of piggybacking but it will in design carry along these tags and you can then later export them out the of after typesetting and after all the corrections have been carried out and this is 1 approach the other is probably more we have 2 is probably more that kind of type find and and extend this approach that's a correct me if I'm if I'm wrong but and so you work in word as long as possible and then the word will be converted to XML and from there in design will be generated all automatically layout of and the the XML last approach is a bit from the point of as publishing professionals so it would work like that you create in design somehow import from what generate from XML but without the tagging you don't carry along the x amount taking and then you just have conventional typesetting of course with some convey conventions uh and some some rules that the typesetters have to about and then you converted XML after it has been typeset and in the middle of it you can always converted to XML standards XML files to process look up and I expect the EU eyes to be 2 to the N is and 5 so that the idealized will also be included in the printed version that I can there are some
drawbacks tool in each of these approaches so the the what I call the F. 1 piggyback XML is is really the idea that due to a 1 might make the full screen now I don't the course will be
cut off of and so but I know of 3 years ago I blogged about that you can read them OK for but
1 thing I want to tweets the link tool to my
presentation it's already uploaded to a server and a so the the disadvantage of this 1st approach XML in design that there there are several we tried that once but we don't want to continue this path because essentially you you have to convert you you have to use flat XML within design and then converted using accessibility and you can have that the same from the same X. is achieved without so exit exam taking in in designed which only causes trouble and then the drawback to that 2nd approach is that the authors expectation of the publishers expectation of of the author's expectation is that they want to see that that need to be they out it's approved and the so they don't want to work in word and the content is finished and then and that converted to toward to in design and then some type system will have the final word on how that looks and edges so this this was not an option uh for for because they they want to stick with the the types such process and be the drawback of this axiom last approach is that the typesetters have too much leeway about uh the uh making all kinds of things in design you have almost the same amount of creativity is allowed and an inward and you know you can they can do all kinds of nasty things that stand in the way of the indeterminist X American version so what to do and select the less evil and so we do we can fix that in designing somehow taking and we we didn't want to create plug-ins 404 in his own other and stuff that would interfere with in in design directly mostly because we don't want to have that the on-premise support thing with all of the thing that you have to adapt to all tools different in these inversions etc. so we we we like to be agile and uh uh with our checking rules that I soon as so if we can deploy them on the so the checking and conversion rules we are more flexible tool to establish new rules rather than a lot of trying to ascended to all types of tests and uh see how they upgrade to the new rules so essentially we want to have it a celebration of the solution and and since we are like some people we turn to the XML Stack for solutions and there are plenty and most of that the most important solution I think is Schematron and with that of course exercise tool of uh uh that the combination of is to and and Schematron make this approach feasible and and uh if you are converted from these formant such idea no logic x these are more or less standardized formats and they are not so of moving targets as an API is for example for for the different versions of the of the design and world so this seemed to be a natural approach and this is how how the the the workflow is structured structured on macroscopic scale we start from idea as we converted and I presented this last year in october uh we we convert it to a flat on the block based format where essentially all paragraphs are flat and the tables in between because like tables and then these boxes symbolize the the amount of configuration that goes into 1 of these macroscopic steps and then we put in knowledge about the of the of the section heading formats or table caption formats so that we are able to create a hierarchy of these sections and to group figures and tables with their titles and the these we we have chosen for it's for historical reasons uh DocBook as the central formant it because we don't want to implement all these complex conversion rules in for every XML vocabulary you you can make these tools also flexible that they that to to any vocabulary so we chose DocBook and then to we have the uh rather young so that all of the not not very much configuration converted from the to the so called whole books and if you click on the link and presented this last year it's the whole adaption of bits so it's essentially a wrapper around bits value you can carry along a bit of a detailed information and that will facilitate these checks so that you can work with using Schematron can also check for for styling all quantities of audio data to heuristically up-converted paragraphs depending on their styling if you want and you can carry along with this to which we part if you don't care about the semantic meaning of some table background color and then you can just say OK background colors will be forwarded to to the top of and we create such an error report will get back to that later on I will just show you for a
small uh find how this works in the whole process is implemented in next prompt and so it's runs on job I we have made a primitive web-based interface where you can just as upload idea because they will know which publisher and to which serious they belong by have are no financial so we have established in convergence to and then we or upload that and it takes a couple of minutes depending on the on the size and think this fight is approximately 40 pages so if you want to that's long so we see and now this step toward fled have and then goes quickly tool some scheme from checks and conversion tool the so called Hobart's to the vector space and then it will be rendered to HTML and then leave if you have will be created and we'll see some if there are some errors and from the project we will also see them here and then finally we get download then the part and have a look at it maybe it's most interesting if
received the the underlying In design 1st it looks like something like that not
sure whether this is I mean it's a plane in the design
and worked with some conventions established and these conventions are documented in in this final uh it's an
German books on it's it's a mix of the
normative some text and
documentation of how to walk around
joining splits cable service which special formatting to use
uh if you have a table footnote which isn't possible in in design so we use some work-arounds and all these work-arounds are
also being documented here and there and that you have that's many people and it will just be
produced by that workflow using I want modulus for for for that purpose all these modules of that picture that later open source so
you can essentially download them and play around with them if you 1 2 the now and the important thing
for for books is that you want to be flexible there are different conventions for 4 different serious books serials all for individual works and you don't have these problems if you have a job production on if you want to stop my aunt and maybe in in in standard so we are also working for the the German stand but you have always something as we had documents live strange veiled requirements etc. and somehow want to cope with that without building on many of her handling of of of special cases in in in the code so you want to keep the caller maintainable and stable what being able to add work specific serious specific or imprint specific configurations and we we say OK we so let's keep it down to to 4 levels we have our core modules and then we have for the whole installation for organization for example we have a common set of rules schema trying tricks XSLT CSS for the book and then we can override it for publishing there's something in different styles and with a logarithm of North America and and they can establish that and then there are individual status for serious mainly around CSS layout so they have a different layout and and but also some of the excess 80 special treatments and it can just go down to to the the work that you can distort exceptions for and work and you don't previously we had to build in all these different cases in in in this central converter and now we have such a kind of on model in Eq society you import the more central of the tools and and grab your specialized the code around it and it can be as simple as that I don't know whether you see it it's a thoughtful most works you don't need any special consideration but if you need a these often look as simple as that that you have just the XSL import statement and here 3 additional templates and that's it for a special work but in most cases you don't have to but you need to be flexible you this needs to be a feasible and then we
have to use Schematron tracks and after each of these macroscopic conversion steps you can do some schema from checks and we will just have
a look at the report that comes out here and
what we do is we collects or we calculate natural called source path at the beginning and after we answered the idea and all of the dark X every paragraph every span will receive this at reviewed and this is the culmination of a key where we can attach the individual error messages to so we have uh checks for unknown style so it will there is a list of acceptable stylists and specify regular expression for style names and it would just be tracked because if they use unknown style names the problem is that it is high that they won't be handled properly and the next set track cruel our work on the wrong idea at the output of the 1st we will check for on and on codes marginal notes for example and then you have checks on the on the other stages for for example this is what happens sometimes that they
have figure captions and they used to 2 paragraphs for that with the figure caption style and of course this is anything that that might destroy your converter if you only accept 1 figure-caption of and then you can report this and you can jump to the next
occurrence of that error all you can do and world
as we have a lot of this is also
interesting it's great towards the end of the book but it's called on it's appendix and they use the format for for preference of for the forward and this is that the message from a Schematron rules that will detects whether there are any ordinary chapters before the the appendix and this will also be the track and you see lots of tricks and if you if you look at the resulting it's XML orbits with
this layout information I copy the location and open it anoxic from
around and then it's somewhat screen real estate and formatted s so if it is unknown whether the worst mother negation OK and then validating it we're we're having 5 5 years and and this is the consequence of that so what what they're reporting to see that the book body finishes here and and then again the front matter is being produced because of the 4 words that they have been using them and sometimes these conversion errors off a few you just gets a scheme of negation
messages that you that a typesetter particularly
want understand so we we also do schema validation these are the we converted the bits of stomata relaxing and because we we use relaxing validation here so you have all these uh scheme of innovations here tool and but you on you you need to
point them toward the the actual cause of the scheme of annotation and we try to abide by being able to flexibly deploy more and more much from checks to give them hints of what tools to avoid that and then
I uh last October you can quickly go I have presented some some more examples for schema from checks that you have
with this exposed conversion of the designed so you have to later in the XML join originally split table cells and they the types of us have to give some hints for that and their commitment conventions for that and but you
know what what we also do is that's a cross various look up so you can if if you
will hadn't conversion I think this 1 doesn't
have so many references or not none at all and you can convert it maybe a skip that 1 this 1 should have more references and it will be after after it will be converted so that it will be a couple of posts to process across left and the the and results will be converted to a in script that the typesetter will apply to the in is on file and it will add the link to your eyes to to the increasing so we have a bit of a combination with a scripting we also do
scripting for um structuring the
references of course they're better tools but since August has references all have the same style it's acceptable that you have erected space uh structure of and they just run it on on the references and then all
the atomic items for for the
references will be is detected and marked up in in in design so they can be exported to XML and then group tool mixture of citations with the person names etc. and but still working
never mind the other challenges with this workflow is a true tool in give the types incentive to were actually use and look at the the chicken was because they often don't look at that and then to what's that this the deadline they converted and see all on we need to maybe break something or implements more conversion rules and then uh it's often too late to all of you they data that that the concept that a prudent and XML and epoch will be drawn from from a single file from a single master file the item and can be you can be person then and they get in the situation where they will converted after and the end of you have to know and and this is the typesetters tasks the task to look at the type outcome and and see where where that all looks looks fine or whether there are some hidden the works of untested so stuff that won't be converted correctly somebody has to monitor that I doubt that the whole framework centered it's open source we have been developing this for approximately 18 months not it's all based on standard technology accessibility to x Proc Schematron relaxing and you can you do whatever you want with that and I have as far as as as long as you give give us credits but as a liberal BSD license a demo you can check it out by artists as the subversion client or you can test it in a web interface that doesn't converge idea has to were anything but talk x 2 e pump and IDM as it's rather primitive but I think you you get the idea how it works and and you can do all of survival stuff with that framework and we in the 18 months of the 12 months since since we've been marketing it we've already acquired major customers including companies the largest textbooks public transit kept qualities and and yeah we also I think it's a good idea tool to offer in an open source tool because it would despite size of hundreds of people we have for software vendor we are quite small to which will give the customers trust etc. that somebody who will take care of it if we can't anymore so if we don't want to any more than I think it's a good idea to have such a central infrastructure component of and open source about will have we have we have built I think a decent business model around and so and we won't get go bankrupt because of source and that's OK yet to conclude its perfectly
feasible you have to the con converting XML last you have to establish some checks and and you can do all kinds of other things not limited to in design for example if you have a web based editor and they produce any kind of garden-variety HTML that you can also use Schematron tool to check what they edited and you can use that infrastructure so side of what we also presented in in Proc in February is a web based editor always you could also be launched in the front with that OK that's it thank you 1 he he is OK if you
run out of questions you know the the
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


Formal Metadata

Title A Publisher's InDesign to BITS and EPUB Infrastructure: Conventions, Configuration, Conversion, Checks
Title of Series JATS-Con 2013
Part Number 7
Number of Parts 16
Author Imsieke, Gerrit
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/21797
Publisher River Valley TV
Release Date 2016
Language English
Production Place Washington, D.C.

Content Metadata

Subject Area Information technology
Abstract Deploying advanced XML technologies such as XProc, XSLT 2.0, and Schematron, an "ex-post" conversion of InDesign files may be a viable alternative to XML-first publishing production workflows.

Related Material


AV-Portal 3.5.0 (cb7a58240982536f976b3fae0db2d7d34ae7e46b)


  528 ms - page object