Bestand wählen

When the “One Size Fits Most” tagset doesn’t fit you

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Erkannte Entitäten
I submitted this talk to all as late submission to jack pine in response to the
call for participation and I have the advantage of not only having written submission but also being on the selection committee so I get to see the edited all unexpurgated comments of the peer reviewers my favorite 1 said there is no news here taken as filler if you absolutely need I'm taking that as a challenge to see if I can make their being here because I believe areas people talk to me about jets they talk to me about jets a lot sometimes they complain other times they called for help or advice sometimes they don't know the difference between the 2 I think that's fair but their complaints in my opinion fall into 2 categories was the complaints fall into 2 categories jets is insufficiently specific or just insufficiently generic and when they're complaining that it's insufficiently specific that's easier because there is no
way jest take something that's really really important or because there are several ways to take the same information which is unacceptable this I have been told many times is unacceptable because just as a standard and there should be a standard way to do all the things that are in ingest purview 1 and only 1 Jack should provide a standard way to encode journal articles I have a lot of trouble with bob should be but I can tell you that there isn't 1 and only 1 way to encode everything in jets and there isn't going to be the request is putting jets way back in pre history most of you were not old enough to remember how many of you remember the jane code Committee it yeah i see 2 hands the GenCode committee was a whole bunch of really really smart people will figured out that as you type material from 1 typesetter to another you had to completely replace and that was a huge amount of work so they set out initially to make a list of all of the tags that you needed for typesetting so that we can have standard tags as we types of all in material and they figured out after a few years of the what they had really set out to do was to make a list of everything that anybody cared about on Planeta Earth and they figured out that that was a pretty big task so instead they came up with a mechanism for people to say these are the tags I care about in this circumstance and this is the way they relate to each other and they called that mechanism in SGML and from SGML we have derived a wall slightly less or convoluted mechanism for doing that which recalling XML and just is a set of tags in XML for encoding journal articles but it is not the complete list of anything that anybody might be interested in at any time on planet Earth which is in fact the scope of the journal literature on planet I don't think there's anything anybody's interested in somebody doesn't publish a journal about probably several so we don't cover everything and we can't when I get panicky phone calls from
people it's usually either mid-afternoon on Friday or the 29th and 30th of month when they really need to know how to take something right now because they have a deadline and that's what those people don't really want to have a philosophical conversation about the nature of tagging and making tags list all of the things that anybody could possibly tag that that's that's not a conversation production people at 3 o'clock on friday afternoon one-half right is so instead we have conversations while actually most of the the time what I can say that there is the words you used in describing the problem have you looked in the tag library index under those works because usually if you actually look in the index you'll find the thing that you're shouting about not being present on so that's a little harder than that you actually have to find an element and know what actually to put on and sometimes they're really asking how to do something that objects out of the box doesn't have a way to precisely identify and that Mr. getting really really anxious because it's really then I had to take something by what it looks like that what except at 3 o'clock on friday afternoon when the thing means to ship that's what you do and besides which if this is the 1st time anybody has ever had this kind of information in your environment that still what you do and it's OK because you need to get your documents published that's what publishing is about it does however give people a chance to think about why the tagging documents what is the point of your XML it is rarely the goal of creating tagged XML documents to create tagged XML documents not not all that many years ago I spent a couple of days when a fairly large publisher of periodical materials journals magazines and things that are sort of in the slush area between journals and magazines that might kind to be 1 of my kind of the the other and they wanted me to help them improve their axonal workflow so OK that that's the sort of thing I don't necessarily let's talk about what you doing so they get the documents in inward and very copyedit inward and they 1 them back and forth and eventually they send them to their page design folks who make pages in in design and then they've got the PDF and they send to a conversion vendor who makes XML which they get back and they put on their server and I said and then what you do with that they said nothing I said how much of it you got 10 years worth how much of that is that in terms of data that's all what it has anybody ever looked at the x amount will know saving yes how good it was yeah well the and the final recommendations made was that the best way they could turn that XML source into money was to be released all that storage and then they would have some fresh storage they could use for some on the that would cost a lot more to clean it up and it is really doing but let's assume for a moment that you were not making axonal for the sake of mixed making XML if you want XML so you can interchange it was somebody so that you can do multiple things with their documents so that you can do the things that the people who talk to index on the 1st place and axonal was good for the power of X amount we told you was that it is possible even easy to transform your XML and other things yeah that's true but how will you get do with that depends on how you handle complex cases and the dirty little secret you may need to do 1 thing with it for 1 purpose and something else for another purpose all users for the same data are not alike let's look at an example let's look at the I believe theoretical example I don't know of anyone who's actually doing this let's imagine that there is a publisher in a field of study in which it turns out to be enormously important what languages were spoken by the parents and grandparents of the authors of the article I'm not aware of any such field of study if there is 1 well that's going to come let's imagine that without that information readers simply cannot properly evaluate the quality of the article the reading sort of like you need to know when you're reading an article was the research was paid for by 1 of the products discussed in the article if a manufacturer pays to do research you wanna know that when you read the article and in this imaginary field of study you wanna know what the authors grandfather spoke at home in order to evaluate the quality of the study so this is important your publisher of journals in this field you want 10 your journals and Jack's why well 1st of all because jet is wonderful and more than that because you wanna sit jets articles to some vendors are who will make your electronic products and some archives who will make your material available for the and they want jets so what you doing With this information about the language of the parents and grandparents of each author well you call timing asked her where you put this in your tracks article because obviously there's a place to take on and Tommy looks in the index and says no out we don't have that we have a weighted recorded the language of the von name of the person but actually we don't have any way to recorded anything about authors parents or authors grandparents much less their native languages on if you had called at least
1 of your service vendors you would have been told not to publish that if you were at least 1 publisher of just started talking about people and the people who are in this room on if you were at least 1 of the publishers this room you would then have not published that my friends is the only absolutely wrong answer if this is important material to the people reading your content don't let somebody say the tagset does all undetected so you can publish it the wrong answer the next put it in a footnote all authors can have footnotes you can put anything you want in the footnote now you've published have you done anything useful yes and no it is on the published document for human readers to see so a person reading the document will see but it is it is it available in a way that people can for example search for it and find all of the articles in your field written by people whose grandparents spoke Latvian note unit to a certain word Latvian you're going to find a lot of stuff on all kinds of linguistic stuff can you do data mining on that note can you compare things on it note but it's there can be seen you could attend they all languages as named content inside those footnotes named content types language history with the words paternal grandfather spoke urges well have been author paternal grandfather's original language so you've got named content content type all the dash language desk grandfather dead if anybody ever gone search that what are the odds that any other publisher anywhere will use that same tagging for the same information even if they have the same information so what you've done is you from a huge amount of tagging added and accomplished nothing it's not even likely that you're wrong Stanford and use these codes consistently much less anybody else so have accomplished much there's another possibility of course you could use custom matter Jack provides in the metadata of things called custom matter wrapped which can contain custom matter which consists of a tag says what is this custom metadata and what is its value it's not associated with contributors and it doesn't have tagging for 2nd of all at Nice also not associated with the authors so now we have them because the matter 2nd author's paternal grandfather 2nd language we still haven't accomplished anything right we're tagging it but we're not tagging in a way that's going to be good for anybody so this is important when we
don't we fill out the form on the nice website that goes to the Jets Standing Committee that says we you to add this to the Japs because it's really important and there's no good way to tag and you've already talk to somebody on the jet Standing Committee whose told you if you wanna send in a suggestion I haven't taken seriously you have to explain what you want and why it's important and give us a sample of the data so you do all of those things you fill out the form you give us a sample of the data you explain how important it is and the next time the Standing Committee gets together standing committee read your recommendation says I this may be important to some very small number of people but it's not important to a lot of people I why is the standard committee not likely to take this request seriously well partly because I made up and I don't think it's true on but more important than that because there is pressure from those are the people who keep calling us to make tagset smaller so you want keep Jack small but you want tag absolutely everything V problem I don't know what the answer is there's another possibility the Jets documentation goes into great detail on how you can customize it for your own use we explain how we show you examples bit book interchange tanks we did I get that right I got close anyway of BMC is a customization of it's out there on the web there's a lot of people who will be talking about in the next couple of days there is an example of a customization of the jets that you can look at maybe what you need to do is make your own customization or maybe you can get all of the people who work in your field to home this is important information together make a customization publish it for your use and then you can all do it the same way naive actually accomplished something right what's the downside honest as 1 our work you're
going to have to call to compensate not can we do this as an ISO activity on at least you might wanna do that because that gets you away from antitrust issues of getting together and talking about how you gonna published with your competitors and it gives you a way to publish this thing you've accomplished but it's not a bad idea to a lot of work now you still have the challenge of figuring out what you can do about sending these tend to documents to these archives and service vendors and libraries who do not live in your little world if you don't know what this stuff here my suggestion XSLT take this stuff tag way you really want the way it's really accessible to you and make it into something then they will handle until you can persuade them that you really really really are important and they need to handle this stuff in your rich form I mentioned when I started talking and I mentioned it just again the pressure to make Jack's smaller as well as the pressure to make it bigger there are a couple of other pressures on my favorite 1 is the green ification of blue people wanting use the publishing tag set not the archive Texas because it's much smaller and it's much narrower and really describes what they wanna do except there's that he was sitting over there and green that we're gonna have the challenge is everybody just that 1 thing are different so there's a lot of pressure to make blue a lot greener on I'm not quite sure how to well find I'm not sure if we should fight that but the other pressure that we get is from people who want to take such to not grow or even to shrink I get people calling me and saying that's just syntax sugar we don't need a name to structure for that it's structured just like a section it uses the section model just call section on we don't to take for bio it's a section of type bio we don't need tags for I Introduction or in the book world processor forward those sections called sections called parts on you could do that in fact 1 of the things that we talk about jets come last time which I guess makes it that's 120 well I was sort of refactoring jets there were a couple of conversations that were about it I actually spend some time thinking about it if we want to make it as generic as possible I think we could and coding everything in jets in about a dozen tags with about 150 attributes I don't think we would have improved much I think we could also do it with about a dozen tags and about 6 completely open sea data attributes and I know we would
have accomplished nothing because nobody will be able to use those things consistently and we wouldn't have interchange but we will have a much smaller tag set and I think the people who keep pushing on us to make it smaller haven't figured out that there's actually value and richness of well a smallness another conversation that I find I myself having with Jax users is the should I be using jets at all Chinese bits instead and I allowed to use jets for this kind of information would it be OK with you if I don't use jacks for something I don't understand the a presumption of religious loyalty to a tag set of if it works what you trying to accomplish music if it doesn't work for what you're trying to accomplish find something that does us if you have a vendor who was pushing you to use a tag set in a way that doesn't work for you and what you're trying to accomplish the difference in the war tell this 1 that this text that doesn't work for you if they can't find a way to work with you you're gonna find a different vendor from you don't have to use just for everything
we also have people who get anxious about using jets for things aren't exactly journal articles it's the journal article takes we a minor while entertaining holds true so if it all or what about conference proceedings textbooks isn't that like against the law there is no law is it convenient to does it work it works if it doesn't work find a way to make it work this is a tag set for goodness sakes it's not a religion we find people using jets for a lot of things that are journal articles particularly in environments in which they have a lot of journal articles because they understand it they know how what works daylight and they want a mixed there non journals things into the same database databases of the same search systems as the journal articles that 1 search them in the same way and 1 store them in the same way so they they put their tractor manuals into the same tag set as they put their journal articles because they don't know whether the thing they're going to be looking for next time is in a maintenance manual or an article about how to maintain tractor it's just the information they wanted it consistently I get calls these days would it be better if I used bits then jackets why do you think it might be better what because it's newer mean trouble there's a work better for your content on probably not then why are you asking the question well because should we be moving the that's because we used to use Jack's now it works for you it works for you I don't understand why I have these conversations and the reason I'm bringing it up at the beginning of this meeting is as you start having conversations with each other over the next 2 days I want you to think about the difference between tagging fashion conversations and tagging productively conversations at previous jets cons the majority of the conversations in my opinion have been about how we actually do things that actually work they than what I think of as productive conversations but from time to time we've gotten sidetracked into what's fashionable and I'm asking that you not think about that if the content of that you want to tag and mixed in with your jets document is the text of the pros on the bus rapid vector somebody is trying to manufacture what a Bus Rapid a bus wrappers a huge tomb of plastic that typically city bus get too wrapped in and then shrink-wrapped on with heat so you've got printed all over the outside of the box is there something you wanna be able to search is this something important to keep with your archive there's nothing wrong with having it injects do I know of anybody who has written an application it will take a jets article and lay it out as a bus rapid printed on that plastic note but if you know of 1 that loved to hear and so have to really cool of there are helpful people out there who called fairly regularly or send e-mail to various lists or whatever pointing out errors in Jackson the Jets documentation I really mean it when I say those people are being helpful I am not to encourage you to do that nobody claimed that stuff was perfect nobody thinks it's going to be if you find things that are wrong tell us about but be aware that if what you mean by wrong is the example in the documentation does not match the guidelines of the vendor to whom I'm trying to supply this data that might not be an error there are more than 1 way to take a lot of things in jets if you want to minimize that use the authoring Texas all and use the other in Texas yeah that's what I thought but on there is more flexibility and bluer than income and there's way more flexibility in green and blue people who want to receive documents frequently don't want to have to deal with all the possible variations of the texts we allowed so they published guidelines for the
local arm local profile of
chance the Japs standard is not any of those local profiles and I don't believe it's going to become 1 so don't be all that surprised if a local profile you wanna use does not match up is a subset of the jets more irritating the matter if you Weissinger Japs documents to 2 or 3 different places you may be challenged to discover that they have 3 different local profiles maybe you get them to talk to each other maybe you can get them to reconcile and yeah maybe you'd just better figure out how to make your Jet stating and what all 3 of those people want vaccine imperfect but it might be the requirement I don't believe I have answered any questions in the last however long I've been talking I think I haven't but what I hope I have done is raise some questions I hope I have got you just think about what the role of Jackson is what the future of Jack's ought to be I put like in the next couple of days for you to think about and talk about the directions that we ought to be going we have a lot of speakers who were going to be talking about something they have accomplished with the Jets on I think of them as how we did not good at my place papers and those who really encouraging I like that In addition to the how don't good at my place papers were also going to have some what I wanna do papers or perhaps what I think we all are ordered to papers let's think about those in terms of who this whole community is how it would work for that community and in discussions let's look at the directions so we want to be moving questions or comments so if I bought the tears
Umsetzung <Informatik>
Natürliche Zahl
Formale Sprache
Kartesische Koordinaten
Gesetz <Physik>
Einheit <Mathematik>
Gruppe <Mathematik>
Trennschärfe <Statistik>
Vervollständigung <Mathematik>
Kategorie <Mathematik>
Güte der Anpassung
Profil <Aerodynamik>
Dichte <Stochastik>
Zeiger <Informatik>
Dienst <Informatik>
Rechter Winkel
Automatische Indexierung
Ordnung <Mathematik>
Lesen <Datenverarbeitung>
Web Site
Data Mining
Weg <Topologie>
Wrapper <Programmierung>
Endogene Variable
Inhalt <Mathematik>
Speicher <Informatik>
Attributierte Grammatik
Leistung <Physik>
Materialisation <Physik>
Imaginäre Zahl
Physikalisches System
Objekt <Kategorie>
Offene Menge
Bus <Informatik>
Wort <Informatik>
Brennen <Datenverarbeitung>


Formale Metadaten

Titel When the “One Size Fits Most” tagset doesn’t fit you
Serientitel JATS-Con 2013
Teil 02
Anzahl der Teile 16
Autor Usdin, Tommie
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/21807
Herausgeber River Valley TV
Erscheinungsjahr 2016
Sprache Englisch
Produktionsort Washington, D.C.

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract JATS does not actually claim to be a "one size fits all" specification. However, many information content consumers (libraries, archives, on-line services) accept only content that is valid to one of the JATS models, and in many cases specify a subset of the model defined in one of the JATS instantiations (Archiving, Publishing, or Authoring). Thus, content creators find that their vendors and tools often assume that they will be using one of the JATS models "out of the box". This can present a real problem when a publisher has, and wants, information that is not modeled in JATS, or is not modeled in the JATS DTD their vendors and publishing partners require. In this case, the publisher has several options: Drop the inconvenient information; use "Custom Metadata" , hide the inconvenient information in prose, abuse a tag, suggest a modification of the standard, or modify the tag set to encode the information that matters to you. None of these options are ideal, and which to choose in large part depends on circumstances.

Ähnliche Filme