Merken

An XML Workflow for Book Publishing

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so the MIT Press which celebrated its golden anniversary this year is the only University Press not states whose list is based primarily in
science and technology that said we publish about 200 new book titles year and some 50 paperback reprints in fields as diverse as architecture social theory economics Cognitive Science Engineering and Computational Science are journals program comprises over 30 titles in the arts and humanities economics international affairs and political science linguistics and physical and life sciences we have a long term commitment to both design excellence and the efficient creative use of new technologies that applies to all our publishing programs be they printed digital book or journal as digital publishing manager my role the press is to oversee the production and distribution of digital products including e-books in their apps and apps rather an attendant metadata my group also doesn't pre composition work offers publishing infrastructure support and manages a digital archive 1 of our mandate is to explore potential points of collaboration between a books and journals divisions with an eye toward streamlining production creating hybrid products all at any given time about 80 % of a
book from list is in our axonal pipeline titles excluded from the X. small workflow include distributed books Books intact they're not converted to word and certain art architecture and design books we have over 4 thousand titles in our digital archive the file types and quality or inconsistent files tend to become less you try the deeper into the backwards we go of the total number of titles in archive roughly half a available as bookmark Web ready PDFs these files are distributed to vendors that use PDF is based format and into vendors it will convert PDF content to reflowable text without financial penalties a portion of titles available as web ready pdfs are also archived in the of format these are delivered the vendors that mandate the provision of the public penalized for conversion of PDF teapot the right now we delivery POD files to Google Barnes and Noble Cobol an apple until recently we've been using caliber to convert the pub to Moby for Amazon but a kinda previewer to generate a file or other vendors receive PDFs pressed probably to put currently is
a deployment of really strategies proprietary are sweet application for content management the CMS which is built atop a MarkLogic axonal server facilitates 7 workflows for the total approximately 140 task nodes for the books division 1 of the major problems we initial encounter with the system stem from an attempt to create a node for every task in a physical workflows which led to unnecessary complexity as a result users were spending an inordinate amount of time pushing documents through the system gradually been paring down the number of nodes to capture pivotal steps in the publishing process the resulting workflows will be ported to insulation of the Open Source CMS Alfresco which for financial reasons we're adopting place 1st we incoming calendar year
but the actually you I notice replacement a based projects are indexed by a unique top level idea is key to our Billy graphic database a deployment of all books the FileMaker solution developed by Johns Hopkins University Press of books contain scheduling information is updated automatically as titles passed through various stages of a workflow there is an exciting
screenshot of all those have this database contains over 10 thousand records with the title information accumulated by the press over its 5 decades of operation up until 2007 the state was spread across 140 satellite data databases orbiting fulfillment system it is from all that's the onyx feeds and other metadata exports the company argue puttin PDF files are produced most of
the book content is often word have capacity to produce in-house XML and hence the pub is presently depend on our ability to secure Word files from the author from a composite that converts other formats in particular attack in all its myriad flavors 2 words the following book launch ingestion of a manuscript and the CMS subsolar subset of our editorial staff tagset content using Eric styles as those of you familiar with the product no Excel's enables our editors to work in a familiar word processing environment without having to interact with underlying mark up In addition to its XML editing capabilities that offers automated macro-driven clean up and order readout functionality that eliminates time spent by editors performing low-level repetitive edits on documents such as removing extraneous whitespace and replacing aristocratic format idiosyncratic formatting with standard formatting converting British to American spelling 1 that
sells more powerful features is its ability to pass reformat references the we have discovered that this functionality requires a little consistency on the artist's part in a fair amount of diligence on Excel's part operate optimally in general appears to work better for our journals division which tends to have less garbage in style application and thus the underlying market in a manuscript may change somewhat during book development so post-processing sweep of the manuscript is conducted by editorial to clean up any outstanding formatting issues chapter by chapter XML files are then exported from the finalized text counterparts using a command in the egg cells were written
and handbooks workflow XL the XML exports adhere to a robust custom DTD I hate the word robust I'm not governor because the meeting produced the apex co vantage that incorporates both books and journals taxonomies the business case for this approach was the potential to repurpose book and journal content to create hybrid products however the journals Division ultimately determined that they were better serve working with Jax this pose a challenge in books we have a solution that works for a composite A's and yields the xhtml we employ to create the pub in house but we also like to be operating with well-supported industry standards and ideally with tag set that aligns with what journals has deployed the issue is finally decided when a consultant suggested that the markup we use for tagging the content on the next version of our online kind of sciences aggregation cognitive which offers both books and journals content should play the so called DTDs without began testing NCBI Book Three we're still in testing phase now so I have lost my my pointer the choose a billionaire now I just use that the in the current
workflow exporter father transform the XML that has been optimized for in design this transform markup is also validated and error corrected the we then delivered to a cops since corrections may continue come in subsequent to file delivery retrieve the XML files with any additional corrections made by the comma after the book goes to press so I'll refrain from showing any more gripping footage of oxygen is where that we use the application around other XSL process to revert the final in design optimize mark up to a base stock XML then we run another transform to convert the chapter files to xhtml which we used to create the pub the xhtml spore into a template containing the pub directory structure mime type in income and in container files CSS and template appear financ X files for the projected NCBI book 3 . 0 workflow we have a few quandaries of these stages let the flatter market so it works within design and that involves an XSLT regardless of what direction we take the single highly taxed in-house programmer to transform we could well be several months out before even testing this suggests a third-party solution which I'll get to in a minute another area for concerns corrections traditionally any changes chapel
char this is 1 of those moments when I really do need a pointer checks I know it is point is actually in that monitor stereo yes right it's real convenient for right on the edge there OK so sorry now therefore concerns corrections beautician any changes the company after revised pages or enter directly into in designed by the comp corrections never reach the source word files In the projected workflow the onus for King corrections is like the farmer manuscript under or another person or group as yet undetermined by upper management but almost certainly not on account x will need to be re-exported from word re transformed for importing design after each correction phase this is not an intractable problem but will require regressive redistribution of effort
the 1 potential solution to some of the EDP issues that NCBI Book Three workflow introduces this type of 5 this cup composition engine uses the pages I capabilities of in design To enable publishers to create scripless templates that automatically transform word orexin of documents in batch paginated fully composed in design pdf epoch documents the press is currently piloting a type of workflow using a crowd cloud based implementation for both books and journal composition the power has yielded promising results this slide shows a sample page produced from NCBI book XML by type of thanks
the at present workflow epoch creation is a labor of love at MIT placement of boxes and tables is not automated and these elements must be moved from the end of the chapter documents in which they appear to the locations within the files where they're called out notes must also be linked from the notes fall back to the individual chapter files of series of find replace actions and populating the NCX file template is also a manual process we validate our heap of files using the IDP of online validator the latest version of Austin incorporates check and it's likely will use this tool after next upgrade
MIT Press the pub is based on a style sheet some 50 elements would deliver the same thought Oliver renders regardless of whether a particular platform enables features that we include in our source files such as a new linking the color recently signed with Apple and were pleased to discover the pub files pass muster without additional work which is actually a big feet according to Apple note that we include a distinct retail he book ISBN on ICIP page press employs 3 such espn's based on market third-party consumer programs such as Amazon Kindle Barnes & Noble Nook receive 1 e e ISBN aggregators content institutional markets receive another and direct-to-consumer program gets a 3rd we found that this approach provides sufficient flexibility to enable decent reporting in market differentiated pricing discount schemes we've also mounted records tied to these eyes into a virtual warehouse in book master fulfillment system To facilitate royalty payment managment reporting available from other sources like ah books on our vendors
operate the content partly share a single R package which is less than ideal but necessitated by the limited bandwidth of our design staff my group batch converts EPS files receive from gained from design into PNG format equations are also stored in the pub images directly as PNG files will probably be abandoning the strategy in favor of methanol 1 decisive devices with fully pub 3 support achieved decent market share 1 best practice I wrote about this recommended by the developers of map type is to use a switch the OPS which to select math ML SPG European G cases depending on which format the reading system supports and to include all these formats in the pub file it's also possible implement this scheme down the road for so are some challenges we face
of the process the infrastructure evolves the 1 of the most significant hurdles is getting tactically well the XML workflow at present any title author intact that we want x style must be farmed out for conversion to word would like to have this were pulled in house in automated nonetheless an ideal situation were working with now is the XSLT processor processing all occurs in oxygen is not in our sweet well this is how a most serious obstacle it means that for the purpose of transforming XML content we're not taking advantage of a marklogic Server that powers our current content management systems the filing is labor intensive even automated tests are time consuming because on a redirect rules may only be applied a chapter at a time and only a subset of our editorial staff strongly skill with the application 1 possible means of addressing the workload would be purchase more licenses and acquire additional training another possible solution would be to centralize the experiment function outside editorial we are currently exploring both options you know evaluation temple in the pub creation still requires more hands on then went like can take anywhere from a half hour to a full day to complete a single file we know that further automation is possible under working toward significant reduction in processing time Of course deploying type of I would also help address the con overhead required to do sorry books and this is also good times and may completed experts at NCBI to consider building on is X groundwork developing an source just the of solution using calabash deduce Mary things the while assurance is limited by the inability to test every file on every platform we don't acts have access to all the device on which a content could possibly appear if we did we still wouldn't have the time or or human resources to conduct all the testing finally a mark up and it's 10 transforms are currently supported by single in-house programmer while the business case for deploying a flat across divisional proprietary DTD was sound we're now moving fall toward the adoption of jets NCBI 3 German book standards for our X styles exports have money the but it
Lokales Netz
App <Programm>
Distributionstheorie
Videospiel
Vektorpotenzial
Punkt
Physikalismus
Gruppenkeim
Mailing-Liste
Biprodukt
Term
Division
Physikalische Theorie
Hinterlegungsverfahren <Kryptologie>
Computeranimation
Elektronisches Buch
Metadaten
Kollaboration <Informatik>
Datenmanagement
Datenfeld
Dokumentenverwaltungssystem
Digitalisierer
Ruhmasse
Computerarchitektur
Optimierung
Resultante
Umsetzung <Informatik>
Prozess <Physik>
Total <Mathematik>
Zahlenbereich
Kartesische Koordinaten
Komplex <Algebra>
Division
Computeranimation
Task
Benutzerbeteiligung
Knotenmenge
Digitalsignal
Datenmanagement
Lesezeichen <Internet>
Dokumentenverwaltungssystem
Total <Mathematik>
Canadian Mathematical Society
Inhalt <Mathematik>
Ordnung <Mathematik>
Widerspruchsfreiheit
Prinzip der gleichmäßigen Beschränktheit
Elektronische Publikation
Dichte <Stochastik>
Open Source
Dichte <Stochastik>
Mailing-Liste
Physikalisches System
Elektronische Publikation
Content Management
Packprogramm
Rechter Winkel
Strategisches Spiel
Server
Dateiformat
Wort <Informatik>
Computerarchitektur
Nichtlinearer Operator
Satellitensystem
Content Management
Datenhaltung
Eindeutigkeit
Physikalisches System
Computeranimation
Übergang
Scheduling
Metadaten
Datensatz
Suite <Programmpaket>
Projektive Ebene
Information
Lineares zeitinvariantes System
Aggregatzustand
Sweep-Algorithmus
Wort <Informatik>
Beschreibungssprache
Stab
Zellularer Automat
Kartesische Koordinaten
Division
Computeranimation
Canadian Mathematical Society
Inhalt <Mathematik>
Softwareentwickler
Widerspruchsfreiheit
Prinzip der gleichmäßigen Beschränktheit
Autorisierung
Lineares Funktional
Addition
Oval
Computersicherheit
Kanalkapazität
Elektronische Publikation
Biprodukt
Teilmenge
Texteditor
Mereologie
Dateiformat
Wort <Informatik>
Textverarbeitung
Ordnung <Mathematik>
Programmierumgebung
Standardabweichung
Programmiergerät
Prozess <Physik>
Minimierung
Beschreibungssprache
Mathematisierung
Versionsverwaltung
DTD
EDV-Beratung
Kartesische Koordinaten
Division
Computeranimation
Richtung
Datentyp
Inhalt <Mathematik>
Datenstruktur
Zeiger <Informatik>
Phasenumwandlung
Softwaretest
Prinzip der gleichmäßigen Beschränktheit
Template
sinc-Funktion
Elektronische Publikation
Biprodukt
Numerische Taxonomie
Informationsverarbeitung
Flächeninhalt
Verbandstheorie
Wort <Informatik>
Verzeichnisdienst
Standardabweichung
Fehlermeldung
Punkt
Momentenproblem
Mathematisierung
Gruppenkeim
Sampler <Musikinstrument>
Quellcode
Elektronische Publikation
Computeranimation
Komplexitätstheorie
Homepage
Datenmanagement
Rechter Winkel
Reelle Zahl
Lineare Regression
Wort <Informatik>
Zeiger <Informatik>
Phasenumwandlung
Resultante
Prozess <Physik>
Quader
Gruppenoperation
Versionsverwaltung
Implementierung
Element <Mathematik>
Kombinatorische Gruppentheorie
Computeranimation
Homepage
Arbeit <Physik>
Stichprobenumfang
Datentyp
Leistung <Physik>
Template
Validität
Reihe
Dichte <Stochastik>
Elektronische Publikation
Rechenschieber
Wort <Informatik>
URL
Speicherverwaltung
Stapelverarbeitung
Streuungsdiagramm
Tabelle <Informatik>
Virtualisierung
Gemeinsamer Speicher
Stab
Mathematisierung
Kondition <Mathematik>
Gruppenkeim
Gleichungssystem
Element <Mathematik>
Systemplattform
Computeranimation
Homepage
Datensatz
Datenmanagement
Trennschärfe <Statistik>
Datentyp
Inhalt <Mathematik>
Softwareentwickler
Optimierung
Bildgebendes Verfahren
Prinzip der gleichmäßigen Beschränktheit
Addition
Just-in-Time-Compiler
Einfache Genauigkeit
Nummerung
Quellcode
Physikalisches System
Elektronische Publikation
Mapping <Computergraphik>
Angewandte Physik
Strategisches Spiel
Dateiformat
Kantenfärbung
Bandmatrix
Stapelverarbeitung
Verkehrsinformation
Message-Passing
Lesen <Datenverarbeitung>
Programmiergerät
Umsetzung <Informatik>
Wellenpaket
Prozess <Physik>
Content Management
Wort <Informatik>
Stab
DTD
Kartesische Koordinaten
Systemplattform
Computeranimation
Arbeit <Physik>
Datentyp
Inhalt <Mathematik>
Coprozessor
Leistungsbewertung
Leistung <Physik>
Prinzip der gleichmäßigen Beschränktheit
Softwaretest
Autorisierung
Lineares Funktional
Addition
Expertensystem
Strömungsrichtung
Schlussregel
Quellcode
Content Management
Elektronische Publikation
Ordnungsreduktion
EINKAUF <Programm>
Inverser Limes
Konfiguration <Informatik>
Teilmenge
Arithmetisches Mittel
Beanspruchung
Server
Wort <Informatik>
Overhead <Kommunikationstechnik>
Standardabweichung

Metadaten

Formale Metadaten

Titel An XML Workflow for Book Publishing
Serientitel JATS-Con 2012
Teil 07
Anzahl der Teile 16
Autor Furbush, Jake
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/30575
Herausgeber River Valley TV
Erscheinungsjahr 2016
Sprache Englisch
Produktionsjahr 2012
Produktionsort Washington, D.C.

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...