Merken

NLM Conversion to Build "Atomic" Physics Content in an Agile Fashion

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the morning we're going to tell you a story of how a publisher and a conversion service providers who learn to work together in an agile fashion over
a period of about 4 years 5 years now 1st let me tell
you the publishers and work with the content looks like the Optical Society is a scholarly publishers we have our own platform and we have about 900 journals that publish this content is fundamental research on topics like by medical imaging later sciences fiber-optic so it's very multidisciplinary the the largest journal publishers about 33 thousand pages a year and we have a growing number of journals we're getting ready to to launch a new title in a lot of my colleagues here here from work in pubs production and I like to
think that we we tend to be on the cutting edge were early adopters of Marcinelle for example we used XML and production processes since about 2000 and migrated are content to and then 3 . 0 starting in the 2010 well when we were migrating our content over to and 11 we we started building a lot of processes that really made everything more efficient quicker to develop new products or schema for example we we have Sasha and made him right hundreds of Schematron rules we've been using up in there is exercise program for a long time so like a lot of we've invested heavily in production and development tools that are based around in lab the problem was that the only other 2010 to 2012 content had a landmark up which was a shame and we thought how nice it would be to take complete backfile back to 1917 have all that stuff in an Olympic XML and beginning to use all our wonderful tools on top of that fortunately around the same time we were
adopting an element x overseas governments as a society publisher our governance has a lot of say in in a week instead money our governance made up of physicist started talking about what we should do over a five-year plan in the discussion was well we should take all of these articles and we should break them up into tiny little Adams right physicist and they should be very well published in sorry Polish and they had a notion that there was there must be some kind of a standard out there to to form these little atoms into and so fortunately we were given a budget and mandate to convert all of the OSCE journal content every bit of it to robust NLM XML
and the success that we were able to show always say leadership in research community it came really quickly really happy about that we we were able to display HTML articles with a lot of interactive features immediately when we converted the content we develop some derivative products will show you screenshot of an image bank that we built been quite popular and I think we underestimated the amount of business intelligence tibial together from from the content from a deep so for example consider the consider the topic of finding who's who's funding research articles the ability to see trends that go back decades and decades specially today these days extremely valuable most
of the USA authors tell us when we them they just wanna grab PDF files put in the library they want e-mail to their friends they kind of say they don't have a lot of for the HTML article but when we ask them a little more carefully most of them do look at the HTML and what they say is interesting the the HTML article let me know if I wanna grab the PDF file to invest time so we've tried to optimize the HTML display to show the reader In this is quickly as possible whether the article might be relevant whether that PDF is worth grounding so as you can see what this screenshot of citation data being able to see at a glance the citations that inform the research that can help but researchers make a decision about the article but I want to invest more time in this article for better or for worse regarding this
next slide joined method the I always say was very early adopter of method market the probably published 1 of the 1st formal peer peer-reviewed journal articles using method the Web was 2001 and 2002 was very early days but we've made a substantial investment in methanol and divorce you cannot you had along that this was so the most expensive and problematic proposition that to convert millions of legacy equations to market and so they they had to be accurate I mean we these these in a lot of times the equations of the most important important parts of the articles so we we
really can't even get 1 gamma or higher order moments convergence but how we went about doing the convergence since fml is quite loose standards is we we targeted the MathJax software MathJax display software that fortunately so many publishers the supporting and have invested in
and it's been really helpful to have both that DCL conversion vendors and the publisher using them in dental environment the same environment so that the readers using the iPad or computer here's a look at the image that once once
you convert a deep back to rich xml and and have images and you can pull the images in the metadata into lots of different compelling products like a Image Bank for Optics and Photonics pretty good proposition because the images can be pretty spectacular and fossil scientifically very useful what dwell on these we like it to story of how we were able to create these products so we did the conversion so we knew we had
about 700 50 thousand pages to converge on the dance material in and we knew that some of the challenges in the in the equation figures tables algorithms bulleted lists etc. but we
didn't expect no that the early society journals really serving multiple purposes these journals served as newsletter the way to disseminate of meeting minutes even if like college yearbook at times there will be informal photographs interesting items but not scientific onto and so we had to capture all of these using the NLM 3 . 0 we do know ourselves to do a superset of the text
beforehand the podium over to divorce I would like to share with you just 3 real quick ideas of 4 items we found particularly problematic and were pervasive if you can imagine that in the thirties when articles were only being read print if yes you don't have to go back to theories but with fact that every available space of that page was very valuable so if I had a book review or piece of news I wouldn't create additional pages necessarily to put the journal I would tap information on the the white space at the back of a scientific articles and that's what our community didn't show that was commonplace among scholarly journals at the time but you could only put little bits and pieces on the page was only so much white space so we have a lot of articles which gift around in such brought us this example of an article that skipped multiple directions in and so the sale had to find all those pieces of stitch them together some of you might have been around in 1985
when the electronic born digital PDF
normal was 1st coming into use of people what
real used to it and production systems that had been set up for a PostScript workflow or some other workflow suddenly you know being introduced to to PDF normal in some crazy things happened we certainly didn't expect to give the CL PDF files but they match the pressure at all and now it's really surprising and DCL had to deal with that 1 of the most pervasive issue that we had a to
deal with that 1 that certainly affected how we did marker there are a lot of photos of people and devices and institutions that had nothing to do with the paper they appeared on so there was a danger that these people that you see on the screen could have been marked up as the authors of papers and when they 1 more slide over here
turn out so we did a balancing act at the beginning of the project we had to wear do we just take all these pages and give them to the CL and then we go back and do our I didn't work out or do we now that sources here under journey Mayfield do we spend a year staff analyzing the content in detail you know making our best guess about how these different items should be treated we really up to to do a compromise in ended up working in a very agile fashion we very pleased with the way the working arrangements settled and I hope that comes out of the balance of the speech of agile fashion which we worked
endeavor I think it's your turn will step back at the end thank you so has dimension of the higher GC the
altitude often convert their legacy content I will already introduced about 100 years worth of material so just quickly about the
the we compare content from an informal sector and the other from format we have lots of experience with chat with an oil and most industry standard DTDs and the company we converted over a billion pages of content and we only have expertise in complex conversion projects so
we talk about how can we manage this large legacy conversion efforts we find to think of
the used approach to this convergence on because of the various sources that were coming in and we had PDF XML SGML and the content changed over time you know we're talking about almost 100 years of content so you can imagine that differences and Scott pointed out a couple of those I think that I would have to be addressed in this way we would be able to allow for a fee is release of new new products so we can work on a collection of time and this way you know even if the whole project is only 5 years within 6 months 0 so we have something to actually put up there and have available so we decided to focus on 1 source type at a time 1 type at a time and we the newest materials 1st and then go back so what some of the sources
are assumed is that we have think DOS maximal I came in and I it became in multiple formats we had at some of that was created that was tied to overseas proprietary DTD and this was of these this XML was created as part of production process so this is x amount from that was mostly it was really very consistent and highly mappable so this conversion was you know the easiest case however converge and were able to do it highly
automated and because of the consistency and the quality of service at some only able to move it over to Jack fairly simply you would think that the conversion from 2 . 3 to the point that would
be the same kind of situation highly mappable I'm however this x amount was after my last it was created as part of the production process they came from all different vendors and the quality and consistency just wasn't there so we have to add additional what level of inoculating and making sure that it was consistent to be able to get into a consistent at some of the source is PDF we have the concepts of going from a print page to an alternative source and all that entails I the page layout hyphenation in things that don't come up when you have alternate source that definitely have to be dealt with when the sources PDF on if because PDF images we have added later as the OCR and proofreading to make sure that all in that conversion those to properly we have also considered those Takahashi again without tonic so as structured you think that that would be highly automated and highly automated and highly mappable however that this was also ICML last the quality and consistency wasn't really the error came from many of from vendors so we haven't had a lot of layers of QA and
quality assurance to make sure that the convergence would go through properly I I'm in addition a certain things that are allowed in SGML that is not allowed some and that would also have to be integrated into the into the conversion for example you might have tables in SGML act you know have come 1 income very a balance is moved over a XML you really need to incorporate them as in common of table with so because a wide range of
material and we knew that we would have to build that flexibility and lots of collaboration into the conversion process while in typical conversion project you would think that you know you would want to to look at everything and developed a very high conversion specification and want to be able to handle everything before you actually start with the conversion and we need that in this situation that this wasn't going to be
on but directly so we developed an overall specification with that knowledge
and the allowance that we would have to be making changes as new scenarios would come we develop the software in spring pieces at a time knowing that will have to incorporate changes as things came and we work in very close collaboration with I say to manage new situations as they came up with a weekly meetings of various scheduled agenda is to talk about everything that had come up during that time I and the only thing would have to make this decision that those points to we want you know go back and
fix what we already compared to we wanna just you know is this something that was going forward but then we have consistency issue and it was really a I collaborative effort and we have lots of e-mail discussions about these kinds of things so the
crime and the collaboration with extremely important i are some form of was built in a way I that would help with that flexibility as well we have the is that we have been a lot of different source formats and we could go to lots of different output formats but the actual conversion identifies and deals with each element by itself so if we would have to make a change to the conversion of the inference sample a dozen necessarily a fact that the convergence of the rest of the article is 1 of the
things find when working on any conversion but of course of numbers and like this but they have not mentioned I did not want a superset of the they wanted to use that as it is so we had to figure out how to fit all the existing structure is into the death sentence structure I mentioned Caliphate multi-label conversion when sources estimate equations which was a very big piece of conversion on when the source is outrunning the necessarily have 1 break between then and has the ones such as PDF it does so how are we going back to what the units can also
incorporate eliminates or not the when we cross
reference ranges in I PDF document that's a very typical kind of on reference class but was 1 of them over time that true there's a they want be items in between 2 they want access of that retained in their I'm for easier reference and what these are all things that have to be discussed and and for whether the know what Mandarin
limitations on if you look at the funeral you see that the the columns and lines and the decimal point now that allowed in the DTD but OSA is really going on does not yield character alignment so is that something that you have had anyone 1 who has a thing when a deal we had a lot of comments
about the contents of areas and you can see we had some missing taxonomy age you know what we do here we can't have a reference to a figure that is just not found in the document again these are all things that you know what are those they want to do here Scott and that mention
this issue about jumping pages we can have an article like as you mentioned that starts and and all that engages with an article on the we wanted some tap there is
specifically in this example of chemical bonds and there's no path to that what what we did what we wanted to it time kind of content we under
nonstandard structures you know things that how are we going have a therapeutic at the centre fit into this kind of disease segmenting in the
whites those Laplacian pdf for source material there's lots of filler information that doesn't have to do with the article by it's it's found within the document is that something that's going to be retained how's it going to be retained so evaluation of the conversion
arm was actually and we have lots of different levels of Quality assurance and elsewhere in the talk about the the middle of you you know as much as we try to
put in his imagination part of possible and much software testing as possible there's just something that needs to be visually reviewed and you know we identified what those things were and in we have to make sure that every everything I would be as high highest quality as possible making sure that the correct entities are used making sure that mass displays correctly table alignment and then you know right images are being called also I developed
in the scheme to end the source and audience in the nationwide on such a deserves a round of applause for the work of the parents of the on this includes over 300 check out for instance of pretend talking all I say the subject matter expertise that DC obviously doesn't have anything to do with the expertise so we were able to work together to when using all the gives you try because they have the text that we wouldn't even think about including the In
addition to that we have our own DCL software that tracks the anomaly in a different things fighting using external files that's associated with that maximal some so checks what on I mentioned
call profanation is a big issue when we're talking about PDF as the source is reminiscent of soft hyphens when it should be retained when to not be retained and so we had a had a special hyphenation spell checker that we built that has a number of different types and specifically what you see on the screen here it within a specific article on you might have the same word sometimes appearing with a hyphen and that anything that gives the very very clear picture of emotion this were found this thing on not these recordings style sheets on
that provides an easier way to review metadata components for a complete set of articles I'm looking at it this report like this it's very easy to see is there incorrect issue on missing volume or something of that sort for our PDF resources PDF
indigenous to both of those piece I mean so having fonts that display the numbers of sources in inside of that make it very easily distinguishable between letters and numbers that would otherwise be very similar very
important here is we need of our learning the things where as things the mother has the backing back we updated all of you know we constantly updated I Convergence of where actually software the schema and the editorial guidelines and everything has to learn new things were able to add I'm more tracks or in a moralist checks the changes to the i to the rules in a in a more agile fashion in the world who would normally think this would happen in a in the conversion process so far structures there was a thanks
to the was was it was quite nice so so always say is nearly completed this larger back-up project with the help of DCL are cracks Stafford and was
saying and we have some advice to leave you with we would strongly urge publishers to stay engaged with conversion project the publishers of the subject experts and the publishers will know with each case that means a decision what the business rationale is because each 1 of these decisions gonna take more effort or less effort and we have the come publishers and make a decision what are we trying to achieve of when the collaboration works right so it it it energizes the project and we we have staff I think of both sides then and publishers who really engaged emotionally and every other way in this project to make sure that if it succeeds including attention at each detail the decision-making etc. so thanks very much a great audience were things again and I suppose if there are any questions will will be happy to respond to them now that yeah the MG Amin forms of
what do you do in cases so and also special alterations obvious errors for example x instead of times do you correct such things for for maybe it's it look as is no matter how how long this came up a lot this problem if you imagine you have a citation you publish citation 40 years ago with the wrong the volume number and you know the correct flight number do you corrected was an equation is if you have the wrong simple but I would say we want totally cavalier about making changes but if they were in metadata if they were in titles author names the references we tended to change if they occurred in other places the manuscripts we left them alone we always add notes to the document indicating that going change so that we retain the original and the correction the 1st thing you may use also like alternative as you have still there and the picture places there and then yeah we we would have the option to buy so be able to see yes we have the option of sending the let's say the computer the correct data to make a link for the citation and then the user could see as published for the corrected version yeah thanks some really interesting work and I can see there were only looking at the very to the top of the iceberg man I know there's expert is always 10 % of the water but I community that you know I I can do just a little bit a little bit it here on what would interest me is is on is the same question except stepping back a few steps because of what strikes me as being the most interesting about this is that you're right there on the line between from optimal that developing of a product is going to move forward and then at the same time turning back in looking retrospectively at ineffective open history of science project to raise which is not don't necessarily the intention of the some of you were the purpose of augmentation however valuable might be right and so that year as such that takes you into another domain market really work against work in a different way and the wizard because just Nunez fighting word you're in for example the mandate is to describe what he she rather than to correct it on my question has to do with the analysis of this work because you could spend all your time analyzing this fascinating stuff and you know much about in you know extended Japs TI over a canal or whatever might be and take up all the the time and it is or how was it your triaging does problem all mentioned 2 techniques for using 1 is that saving a lot of stuff for later so if you can imagine all the different document types that's a hundred news all the different types of news that there might be and how valuable would be for historians researchers to pull that different types of news without stopping to analyze all the different items were asking the CLP do that but we can ask them detectives news and then go back next year if we have a project now and then get more granular about that something like document types the other example is maximal although always stays using method now and never many publishers using use it's not content with the is just describing what the tagger things math looks like that's all it's doing so ultimately 100 years from now or even 5 years from now when MathJax is able to pass the equation and model it in different ways work with the staff of the but this is a big project it has an enormous budget I think the budget triple if we had to hire content experts to do scientific level tagging of equations and and those types of items so that those where we make compromises those 2 areas yes in front of modern if my tripoli and have a related question you encountered like unusual editorial practices lexicon random photos and the end of an article are you trying to faithfully represent them can new formats or are you taking an opportunity to correct history to say that shouldn't have been there in the 1st most of those cases were using the 1st technique I mentioned window where isolating that material from the scientific article but then putting it to the side were not spending a lot of time trying to make it reusable so the the additional work that would be needed to make those items are part of a product offering or something and that was really useful to researchers in the future so we're capturing them but not with enough information that we we think that we can during engaging things with their content there has there is no you don't know about were there to do do that or is it is the only for that it's it's probably going to be better at the end of of project to see how many of these things do we really have what is what's all the variety and instead of taking time in the middle of the project to do that but we do do that level of analysis with the scientific article content but with the these things that appear in matter were trying to generally capture them into categories isolate them but but not to a lot of polishing in and deep description of what those items on are you can then of an 3 . 0 this time but I know our team is really anxious to move to check that even really rough you suffered a that's all you know and tagging yeah the employment well what what console and just a minor and much more reminder of practical lives in some ways of sorry to be trampled practical the 1 of the slides mentions CSS not supporting character alignment in tables and that's OK you don't need to wouldn't think but you know from this this is actually before this is a put
pedantic creates a false statement because it's a suspect does include support for table character alignment however web browsers those supported because no 1 could possibly have wanted to
write a novel is ever comes to them and said we need wake up guys if you filed body with Mozilla filed about with chromium have you thought about with CSS because if you don't come to school tell us we can happily believe that no 1 in the world needs this feature but because we don't know then so come and
tell us what there is in fact a feature of other is in fact a feature request for Mozilla for current supporting character alignment there has it was opened in 1990 knowing since then there have been a total of 66 comments of which the most recent says this is not an important feature and there is no demand for it but absolutely
serious because a community like this and similar a business community you would think would have a lot of demand for this feature but they don't come and tell standards bodies so we don't know of the web browsers went implement features of the thing is that demand so as I was saying yesterday my told we need people to come and tell us we need this feature so that we can get it back it is supported by
the for example by internet house for of the generating PDF from the excessive with reasonable that so we we shouldn't throw his information away
from existence right right we should capture save it in ultimately the houses will by this is anything else that you need and you feel the web platform is isn't supposed tell us because we have some of us really helpful great thanks for the common hi Elizabeth Windsor with the Johns Hopkins University Press and this is obviously a grand effort and I think you mentioned it quite extensive and can you help us understand how you've been able to monetize and since the completed work some a completely different well we we certainly have some new derivative products like the image bank in and have been able to well let me back up the USA has its own publishing platform it's called optics and back infobase and In order to be competitive with all Silverchair and all the other platforms that are becoming increasingly sophisticated we felt we really need to have a platform that had all the features that 1 sees with all of our our peers journals and the ability to survive HTML was essential you know the eye tripoli journals and all of other of few journals peers journals I have these features but we also want continue evolving are platform and this you were working heavily on semantic enrichment and having the XML is as hawks for that but we need to do with the Semantic Enrichment is is invaluable it's absolutely essential so that that was part of the rationale is to support the platform as long as I saying is going to host large numbers of journals and maintain a platform and we need to have the content that that makes makes all those features viable yes I have a question on the common some of the common so much as to those working at the museum where doing all markables legacy literature about biodiversity and also thinking about semantic enrichment of like to include your case as of 1 scenario outside of biodiversity of things that could become don't answer all here OK with me for 1 of from the short answer to the question putting on Monday there's is any of your content under an open license because my book likes PMC a lot but it also the ground for food that isn't just language that understand policy has half of its content open access in offer funded open access journals that's what just means of free to read for you yes that was going to be my follow-up content and common words not exposing the full in a land mark outside of always say other than descriptive metadata so that that's the current business position that that could change at some time we realize that would be valuable for brothers in some ways so we running over time 1 more question yes that I speaking or from Dartmouth Journal services but I'm curious about the level of QCD that you did on the OCR stand material differently different typographic conventions from decades ago what did you use to to ensure the quality did you do a literal character by character check awarded you have a list of certain things to look out for I have no idea that's a different question yes 1 not 1 sources PDF and we do a character-by-character track I'm honestly honing in and that we have he an automated checks that bring up potential problem areas things like special characters I'm Hyphen's ligatures in things like that but but it is a a character-by-character chat the thank you very much
Prinzip der gleichmäßigen Beschränktheit
Fundamentalsatz der Algebra
Content <Internet>
Gewichtete Summe
Content <Internet>
Zahlenbereich
Gebäude <Mathematik>
Extrempunkt
Biprodukt
Frequenz
Systemplattform
Arithmetischer Ausdruck
Service provider
Computeranimation
Homepage
Ordinalzahl
Web Services
Theoretische Physik
Vorlesung/Konferenz
Technische Optik
OSA
Strom <Mathematik>
Bildgebendes Verfahren
Bit
Prozess <Physik>
Physiker
Content <Internet>
Automatische Handlungsplanung
Content <Internet>
Gebäude <Mathematik>
Schlussregel
Dienst <Informatik>
Element <Mathematik>
Biprodukt
Computeranimation
Ordinalzahl
Standardabweichung
Rechter Winkel
Vorlesung/Konferenz
OSA
Biprodukt
Optimierung
Softwareentwickler
Standardabweichung
Autorisierung
Content <Internet>
Datensichtgerät
Content <Internet>
Interaktives Fernsehen
Dichte <Stochastik>
Biprodukt
Computeranimation
Entscheidungstheorie
Ordinalzahl
Interaktives Fernsehen
Twitter <Softwareplattform>
Business Intelligence
Theoretische Physik
Phasenumwandlung
Programmbibliothek
Information Retrieval
Derivation <Algebra>
OSA
Biprodukt
E-Mail
Bildgebendes Verfahren
Momentenproblem
Datensichtgerät
sinc-Funktion
Aussage <Mathematik>
Gleichungssystem
Computeranimation
Rechenschieber
Ordinalzahl
Benutzerbeteiligung
Software
Mereologie
Vorlesung/Konferenz
OSA
Gammafunktion
Ordnung <Mathematik>
Standardabweichung
Metadaten
Ordinalzahl
Subtraktion
Bewegungsunschärfe
Theoretische Physik
Aussage <Mathematik>
Gebäude <Mathematik>
OSA
Computer
Biprodukt
Programmierumgebung
Bildgebendes Verfahren
Computeranimation
SISP
Content <Internet>
Gebäude <Mathematik>
Gleichungssystem
Mailing-Liste
Newsletter
Computeranimation
Homepage
Homepage
Teilmenge
Ordinalzahl
Algorithmus
Verbandstheorie
Digitale Photographie
Theoretische Physik
Technische Optik
Information
OSA
Figurierte Zahl
Tabelle <Informatik>
Bit
Dichte <Stochastik>
Hochdruck
Eichtheorie
Dichte <Stochastik>
Raum-Zeit
Physikalische Theorie
Computeranimation
Homepage
Homepage
Matching
Ordinalzahl
Einheit <Mathematik>
Theoretische Physik
Richtung
Information
Normalvektor
Matching
Ordinalzahl
Druckverlauf
Dichte <Stochastik>
Vorlesung/Konferenz
OSA
Physikalisches System
Biprodukt
Normalvektor
Hochdruck
Steuerwerk
Computeranimation
Autorisierung
Gruppe <Mathematik>
Content <Internet>
Stab
Sprachsynthese
Quellcode
Marketinginformationssystem
Analysis
Computeranimation
Rechenschieber
Summengleichung
Extreme programming
Digitale Photographie
Projektive Ebene
Touchscreen
Content <Internet>
Hausdorff-Dimension
Content <Internet>
Gebäude <Mathematik>
Nummerung
Komplex <Algebra>
Dateiformat
Computeranimation
Homepage
Homepage
Rastertunnelmikroskop
Maßstab
Standardabweichung
Komplex <Algebra>
Zustand
Chatten <Kommunikation>
Grundraum
Ganze Funktion
Dateiformat
Vorlesung/Konferenz
Projektive Ebene
Programmbibliothek
Standardabweichung
Subtraktion
Kollaboration <Informatik>
Materialisation <Physik>
Content <Internet>
Content <Internet>
Dichte <Stochastik>
Gebäude <Mathematik>
Quellcode
Biprodukt
Computeranimation
Strategisches Spiel
Open Source
Quellcode
Ordinalzahl
COM
Ein-Ausgabe
Fokalpunkt
Datentyp
Projektive Ebene
Biprodukt
OSA
SGML
Prozess <Physik>
Punkt
Dichte <Stochastik>
DTD
DTD
Biprodukt
Computeranimation
Open Source
Ordinalzahl
SGML
Mereologie
Dateiformat
Vorlesung/Konferenz
OSA
Dienstgüte
Widerspruchsfreiheit
Normalvektor
Prozess <Physik>
Hochdruck
Gebäude <Mathematik>
Computeranimation
Übergang
Homepage
Open Source
Vorlesung/Konferenz
OSA
Widerspruchsfreiheit
Bildgebendes Verfahren
Normalvektor
Addition
Dichte <Stochastik>
Gruppe <Mathematik>
DTD
Dichte <Stochastik>
Quellcode
Biprodukt
Summengleichung
SGML
Ordinalzahl
Mereologie
SGML
Fehlermeldung
Tabelle <Informatik>
Umwandlungsenthalpie
Ordinalzahl
Kollaboration <Informatik>
Software
Kollaboration <Informatik>
Prozess <Physik>
Prozess <Informatik>
Vorlesung/Konferenz
Gebäude <Mathematik>
Projektive Ebene
OSA
Computeranimation
Quelle <Physik>
Kollaboration <Informatik>
Punkt
Mathematik
Prozess <Informatik>
Gebäude <Mathematik>
Computeranimation
Entscheidungstheorie
Software
Kollaboration <Informatik>
Ordinalzahl
Software
Vorlesung/Konferenz
E-Mail
Widerspruchsfreiheit
Subtraktion
Inferenz <Künstliche Intelligenz>
Content <Internet>
Zahlenbereich
Gebäude <Mathematik>
Gleichungssystem
Element <Mathematik>
Computeranimation
Eins
Entscheidungstheorie
Bildschirmmaske
Einheit <Mathematik>
Theoretische Physik
Stichprobenumfang
Kontrollstruktur
OSA
Datenstruktur
Funktion <Mathematik>
Drucksondierung
Kollaboration <Informatik>
Mathematik
Prozess <Informatik>
Datenmodell
Dichte <Stochastik>
Quellcode
Teilmenge
Kollaboration <Informatik>
Software
Programmfehler
Datenstruktur
Client
Dateiformat
Inklusion <Mathematik>
Schnittstelle
Hypermedia
Ordinalzahl
Spannweite <Stochastik>
Mathematische Logik
Entwurfsautomation
Klasse <Mathematik>
Datenmodell
Soundverarbeitung
Vorlesung/Konferenz
Spiegelung <Mathematik>
OSA
Computeranimation
Punkt
Dichte <Stochastik>
Content <Internet>
Content <Internet>
DTD
Steuerwerk
Computeranimation
Homepage
Numerische Taxonomie
Ordinalzahl
Flächeninhalt
Einheit <Mathematik>
OSA
Figurierte Zahl
Gerade
Cross-site scripting
Ordinalzahl
Content <Internet>
Unicode
Content <Internet>
MIDI <Musikelektronik>
OSA
Datenstruktur
Bildschirmfenster
Computeranimation
Fitnessfunktion
Homepage
Subtraktion
Content <Internet>
Gebäude <Mathematik>
Schriftzeichenerkennung
Dichte <Stochastik>
Quellcode
Computeranimation
Übergang
Ordinalzahl
Software
Verkehrsinformation
Information
OSA
Laplace-Operator
Leistungsbewertung
Tabelle <Informatik>
Softwaretest
Fehlermeldung
Datensichtgerät
Ruhmasse
Gebäude <Mathematik>
Nummerung
Unrundheit
Mathematik
Quellcode
Computeranimation
Open Source
Ordinalzahl
Mereologie
Vererbungshierarchie
OSA
Sehne <Geometrie>
Bildgebendes Verfahren
Tabelle <Informatik>
Instantiierung
Tabelle <Informatik>
Addition
Subtraktion
Extrempunkt
Zahlenbereich
Systemaufruf
Gebäude <Mathematik>
Dichte <Stochastik>
Quellcode
Elektronische Publikation
Computeranimation
Homepage
Ordinalzahl
Software
Datensatz
Datenstruktur
Fahne <Mathematik>
Software
Datentyp
Wort <Informatik>
OSA
Touchscreen
Vervollständigung <Mathematik>
Gruppe <Mathematik>
Zahlenbereich
Gebäude <Mathematik>
Schriftzeichenerkennung
Dichte <Stochastik>
Quellcode
Ähnlichkeitsgeometrie
Quick-Sort
Computeranimation
Komponente <Software>
Metadaten
Menge
Font
Verkehrsinformation
Phasenumwandlung
Zusammenhängender Graph
Spezifisches Volumen
OSA
Versionsverwaltung
Verkehrsinformation
Prozess <Physik>
Mathematik
Beschreibungssprache
Systemplattform
Content <Internet>
Schlussregel
Computeranimation
Entscheidungstheorie
Schlussregel
Spezialrechner
Software
Ordinalzahl
Rückkopplung
Datenstruktur
Software
Biprodukt
OSA
Datenstruktur
Expertensystem
Cracker <Computerkriminalität>
Beschreibungssprache
Stab
Content <Internet>
Systemplattform
Computeranimation
Entscheidungstheorie
Entscheidungstheorie
Bildschirmmaske
Kollaboration <Informatik>
Derivation <Algebra>
Vorlesung/Konferenz
Projektive Ebene
Biprodukt
OSA
Hilfesystem
Bit
Minimierung
Browser
Versionsverwaltung
Gleichungssystem
Computer
Computeranimation
Übergang
Metadaten
Deskriptive Statistik
Bildschirmfenster
Randomisierung
Vorlesung/Konferenz
Spielkonsole
Metropolitan area network
Befehl <Informatik>
Gruppe <Mathematik>
Dichte <Stochastik>
Content <Internet>
Kategorie <Mathematik>
Biprodukt
Konfiguration <Informatik>
Rechenschieber
Erweiterte Realität <Informatik>
Dateiformat
Projektive Ebene
Information
Fehlermeldung
Tabelle <Informatik>
Varietät <Mathematik>
Subtraktion
Wasserdampftafel
Stab
Zahlenbereich
Domain-Name
Informationsmodellierung
Digitale Photographie
Datentyp
Äußere Algebra eines Moduls
OSA
Spezifisches Volumen
Maßerweiterung
Inklusion <Mathematik>
Cross-site scripting
Analysis
Autorisierung
Expertensystem
Mathematik
Binder <Informatik>
Ordinalzahl
Flächeninhalt
Offene Menge
Mereologie
Wort <Informatik>
Ordinalzahl
Total <Mathematik>
Dichte <Stochastik>
Gebäude <Mathematik>
MIDI <Musikelektronik>
Computeranimation
Cross-site scripting
Ordinalzahl
Dichte <Stochastik>
Gruppe <Mathematik>
Browser
Vorlesung/Konferenz
Dichte <Stochastik>
OSA
Information
Computeranimation
Cross-site scripting
Internetworking
Standardabweichung
Subtraktion
Vektorpotenzial
Ortsoperator
Formale Sprache
Familie <Mathematik>
Zahlenbereich
Systemplattform
Technische Optik
Formale Semantik
Übergang
Benutzerbeteiligung
Weg <Topologie>
Existenzsatz
MIDI <Musikelektronik>
Bildgebendes Verfahren
Content <Internet>
Peer-to-Peer-Netz
Mailing-Liste
Dichte <Stochastik>
Strömungsrichtung
Quellcode
Biprodukt
Ordinalzahl
Dienst <Informatik>
Flächeninhalt
Offene Menge
Mereologie
Wort <Informatik>
Ordnung <Mathematik>

Metadaten

Formale Metadaten

Titel NLM Conversion to Build "Atomic" Physics Content in an Agile Fashion
Serientitel JATS-Con 2013
Teil 12
Anzahl der Teile 16
Autor Dineen, M. Scott
Gross, Mark
Ashlem, Devorah
Friedman, Beth
Schwarzman, Alexander
Kupferstein, Gitty
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/21800
Herausgeber River Valley TV
Erscheinungsjahr 2016
Sprache Englisch
Produktionsort Washington, D.C.

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract When faced with the challenge of converting 8 highly technical journals spanning 95 years, how do you divide responsibility between the content owner and the conversion vendor? Do you spend a year on document analysis and developing conversion specifications, or do you hand the project over to a well-regarded service provider and rely on their expertise entirely? This paper demonstrates how an agile approach to content conversion with close collaboration between the publisher and the conversion vendor has allowed The Optical Society (OSA) and Data Conversion Laboratory, Inc. (DCL) to navigate between the two extremes and create a high-quality digital archive that will serve OSA's strategic aims for developing innovative products and services.

Zugehöriges Material

Ähnliche Filme

Loading...