Merken

Living in a World of Book Tagging

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
it and so I'm not sure should use a little bit from around what we've heard so far but I think you'll hopefully few entries right to say that there some things we've already and picked up on the on and off and as we get that and I mean as beginning here I want to I'm just sort of gestures of a little bit too well to to some of those things
so that that the basically when a talk about here is on talking about the different today is the the relationship that you which of course we don't know what is going to be at that the relationship which is surely going to come about between and between bits and and that testimony doing books and the TI community that's been doing books or or something which you could call books for quite a long time and has a lot of experience in this area and of course that many of you have already around this issue and I just like the shorthand artist who has experience with you I did the authority here
that you know but had a good solid handful media know 20 % or so group which is which is about right and so on you know you you will already know what it is that I'll be addressing here as we as we get into this and of course is just the I this is also other communities that are on that in use markup languages and axonal particular on 2 of market do of a kind that content that includes said DocBook and data and so on so what do we start asking why are we looking at this and I'll come back to that and I'll come back to the fact that some of our results because of a release of something about that I he also leaving aside right here for a moment the question of what is a book which of course
underlies many of these the
complicated problems and so of course you know we have a Venn diagram on and this is my kind of say that this is not an accurate representation of reality on this is merely to remind you that we're going to have some overlap here in terms of the scope of these tagsets right in and in particular you'll see you know it's quite easy to imagine an application of of the book content for which you might well ask or why shouldn't this be DocBook will widen PGI wide doesn't need to be that on and and . it could well be in any of those things on the other thing I'd like to unstressed years I've put the bits on the struggle in dots because the precise scoping of bits isn't really known yet it doesn't really have a defined scope is very clear and in fact we've already heard a little bit of this disputation on this topic on between people groups the as where you know what kind of content that is exposed is expected of on the course all the busses uses of jazz research well we can do basically just about anything that is 1 legitimate answer this question about about scope but many begin to draw your scope also begin to raise questions about how would you rate yourself on from from so from other projects are also part of the same scope on
so on this is you know this is already got something that we're going to be having to think about on and on the as as I get into the detail here with the with the TIB this puzzle like to point out that time you know we have of course priest started by saying well you know book publishing just like journal publishing except when it isn't and we've already heard 2 things that are really really important when you're looking at puts books as distinguished from journals on number 1 is the scoping question on that and that the the books have a different kind of heterogeneity across a set of books on it and to sort of different kind of fractal organization of that content the journals to 1 and this is related to the 2nd difference which is that the locus of control on a publishing books is very different from when your publishing journals and city of course just just point this out all the symbol of or say when when when when a book to be discussed you you can exert your control from the center and the at the editorial level in the same way and this relates to this heterogeneity of the scoping question on the so I wanna keep these larger questions in the background because I think as we go forward with that D really read it critical determinants and we need to keep in mind that it's very different when working with books from we're working journals and so we need keep in mind that that you know that the training is shorten the you know it's not really exactly the same kind of the same kinds of problems on this is on the so well known In explores station interesting
tensions CSS into the next 1 appear that I so let's get to this and a little bit of detail and I'm gonna talk about something I here because the eyes the 1 that's closest induced my heart but that it may seem issues will come up in different ways the Department of the tag
sets but in the eyes been around for a long time and it was the Proj was founded started in 1987 on for that it did not even have an initial commitment to SGML in the very earliest days in TI but quickly became national applications because generic mark-up did seem to be the obvious solution to the kinds of many the provinces so I can use this on its intended to deal specifically with academic projects an academic scholarly work and of course includes scholarly publishing which of course is just just just told us is within the scope of patents right but it's not just publishing it's also other academic problems and uh and and and issues having to do with you creating electronic resources electronic editions which may or may not have publishing is the main goal so already there your singing interesting distinction between that between the focus of EIS culture and were were coming from indigestion the other thing is that the that that mandated TI's extremely broad because beyond the fact that this is academic scholarship for every little is said about what kind of work you're doing away with with what domain the work in what your academic specialty is what is in your students so resemble men many should graphic the transcription is something that people do in TI but they do occur in T. which is very different from that of the idea for example a dictionary bridges work with so within the GI community you've got a huge amount of of of
variation originated from empress discrete certain kinds of problems so that for example pub people who've tried to build publishing a separate from using TEI out a box of often run into problems because T i have a box is really designed to facilitate the kinds of work to the exclusion of other kinds of work right so on GI does have some sort of famous shortcomings which you know any work with TI sort of understands comes the territory and it has a very large number of element types and you want cut them down and that means customization to about a level of technical investment that may not be comfortable for for many organizations on the on the you know that certain modeling decisions have been made and sometimes compromise the made with respect to elements semantics is that it always fit so
you have another area their profile customization and 10 Gy projects begin to be subspecialist within TI and and of course that has a kind of a price in terms of the interchangeability of GI data but it did you know to users cannot be expected that there about is just gonna work nicely the some of the GI system and in this particular problem is well known and well understood when the G that community and I am done so 4 letters focused on publishing this you know already is something that might scare away from this particular communities particular on markup application but on the other hand it would be a mistake not to pay
close attention to people who have altered the friendly competitors of ours in the sense that they actually have a lot of experience with kinds runs with we are going to be running into the army and the experience with heterogeneity across the board is really really really critical to the eye does offer a unifying framework for dealing with very different heteregeneous kinds of content and end of and in particular TI because it is an active community also has been able sponsor over the course of time the development of specific TI profiles that address that address particular problem spaces much much more deliberately is specifically then you can say gee i'd as as a whole so in that respect it's not bits forces TI bits is easier to get started with you not to customize a difference that's because the is already done a lot of the customization necessary that we will be looking at so we shouldn't really be comparing ourselves to TI in general we should be looking at specific initiatives than the the on so I'm basically going to be doing that here I'm going to
focus specifically on 1 issue with the best practices with TI libraries and there a lot of work has gone into this over the years in terms of the profiling TI developing TI for libraries and specifically for electronic publishing of projects within the libraries and done in fact and it Hawkins of you from a 0 is 1 of the co-authors of the version 3 this document so that you know even within the jazz community we actually have a you know a good bit of experience with some you know we're looking at TI in in within the context of things that
in many ways the over about what war like our own then the you know the you know the professor who wants to preserve the you know electronic additional medieval French when it was something that might be very different thing but these people are going to be very similar to this 1 and I'm in particular the best practices for library documents on this sets out to of course solved general problem by breaking it down into into parts and specifies the way in which a project is getting in getting into electronic publishing and wants to move the content into electronic format can do it through its status set of various sort of poor grades from a lights-out straight you know OCR plus the automated process to gives you a minimal tagging enough to get your
hands around the the data all the way up to full-blown detailed semantically tagged CI and in particular on in this in this uh uh sort of lateral
and and like to point in particular to the middle step steps 3 and steps for particular because that's really the place where it's all right that's a sort of maybe a little bit more than your simple analysis but maybe a little less than the what that the the best practice for libraries calls of basic content analysis on and on you know so that so the question there is well is there no is there of of a body of knowledge here that that we in this community can get access to relatively easily and of course the answer is well actually has the codified to this extent in fact if there is available from these profiles which in you there are and that gives us a really good place to start in terms of understanding on and how is it that we can relate to
this 1 so you know given the formalized and specifications schemas and documentation we can take the you know the bits text set with its schema and on the arm and it's like annotations and we can take the TI we can 1 what together because they will work where we have here 1 stable point of reference for alignment so what
I'd like to propose is that we think of this is actually a technical problem in terms of mapping from 1 text to the other and back and and uh on the this doesn't have to take the form of an automated process mean an automated process that we perform this transformation might be a relatively and straightforward and and so on the spin off of this project but the real project is the intellectual work of actually understanding what is it the i is decide to model and understand what is this is decided model and see where and how those things aligned we should be surprised to see in 80 year 85 % alignment between those 2 things because you know as as as Bruce reminded us it's all the same stuff right means over over on and yet sometimes also going to be 15 20 % kind of area where they don't align well and then we have a lot to learn from paying attention to those points and so on the young
of course the benefits of this is that we can understand that you know this stories that sorting guy's not really necessarily need to look for ways to the extent that these things can be aligned we can in a sense have both at the same time because if we have a formal mapping we can think of its content according to the IRA's person I but then of course the you know understanding the delta has also vary from and useful because that lets us
but understand and point specifically to those places where there may be a trade-off right there may be you know that this is an area where bits actually has a real advantage this level of granularity here is not so precise is going to be relatively much more straightforward to take this stuff and or system vs. GI which have mental overhead because it's granularities for precise will make use of around and and in addition on I think it's also worth pointing out this is something in as as an electoral exercise that people wouldn't be doing and to the center we do this work in public and which comes back to the lost point this morning and that the you know the best deal way of dealing with the gasses to be receptive but pro active at the same time right the find that balance and here if we can be proactive and establish this mapping away and in in a public form it doesn't have to be done over and over and over again and where the mapping itself was controversial raises questions those can be raised in public and understood better as well on and so it you know the the the general thing here is good fences make good neighbors right if we can understand the relation on the technical level the relationship between the complexity i will be much better position going forward to actually address and the you know the have a have a live together and in world book tagging and so on
this was originally my last slide and I'll tell you the 2nd why isn't but I my question for you is this this as obvious and ideas of sounds and this is something that's roles with relatively straightforward do or are there certain hidden gotchas here that I have a lot of time and then in terms of practicalities well this is a good thing to do how should be done who should do it is a volunteer effort something motivated it's you know you want people who have no expertise here there are plenty of jazz users and TI users who could who could contribute usefully on but the other hand if you want done well it may not actually you know and be organized as a as a committee active you want to get out the door amounted to and so I mean very interested in your thoughts about this whole problem and also within the context of that general such problems about the you know the how do other bits scoping is defined over the long term because I think that this for also dovetails very importantly the effort of understanding what the proper scope is 4 bits not just as an application but as a community so I simply slice to person said well yeah but how people are going to know how do we know whether user with the use of ye I bits and of course I that was a question I was prepared not answer on this sort of let you and leave you with that but on city I'm also had had a similar question with respect to Dr. data on so it you know it is Bruce having said this to me I kind of had to admit that yes in fact it is a question of people on a house and it really wouldn't be fair really completely punt on this that that that they would be something that maybe I should at least set the stage for talking about so number 1 is to understand that there is a big intersection in this van
diagram and that in fact it is true that there are many cases where you could really just use 1 or the other arm and that they do have particular strength but sometimes it might not be as big differences as you want on the other hand if you want to focus on the differentiation there are things to say number 1 technical
considerations right you have the chance that has a particular varies a specialization with respect to to the the kind of content it's model and in a few or on maybe I'm trying to transcribe the you know the collected Correspondence of noble prize winner maybe you wanna look into the eye because it does letters and it already has all precooked all of the you know all of the that the tags you need for epistolary sure as opposed to simply administrative arrest you guys in bits you have models for letters and they are so easy they haven't mn yeah so so so we have not added this very on when you do this analysis look specifically the metadata and the references and citations because there's always error areas where you have the entered particularly TI many publishers find metadata the underspecified so you want to specifically the metadata chunks and see whether you know you have differences there and then of course there's a question about tools and toolkits
on but the most important set of considerations of the non-technical once in his hand like you know analogy I came up with was like joining a country club where it's like OK they both have these supports in both the golf course and this was nice this way but no boy there's no whole 6 of there's really you all there's that but it's also put crowd you wanna be hanging out with right where where are you going to finding the supported finding help to you need and whose whose style of working is
the right end on that that those are questions that you really have to be asking yourself uh as you get into this you know only obviously but you know any organization that can support both is going to benefit immensely from do you know if those 2 constant kind of keep each other you know from you know the war bringing out the lunchroom and then and then you know your your on your appeal to benefit from the and ask for organization probably consideration of the 1 with with the bits was a TI is if you're already using jets on jets heavily you know city point out is kind of a no-brainer right you get a whole lot of know-how and tools for free and it's a simple and ready transitions well if you're using TEI then you should necessarily think of bits is being then the next thing you should transition into because the guy has proven itself to be or not serviceable and what you should be thinking of there is not so much in which move my of system beds so much as looking is this is being an expert uh a format for exposing data which he maintained in the form of a more comfortable arm so what does that say from I guess someone so be the fat
Bit
Vorlesung/Konferenz
Quick-Sort
Autorisierung
Stereometrie
Bit
Momentenproblem
Flächeninhalt
Beschreibungssprache
Hypermedia
Güte der Anpassung
Gruppenkeim
Vorlesung/Konferenz
Inhalt <Mathematik>
p-Block
Computeranimation
Domain <Netzwerk>
Skalarprodukt
Bit
Selbstrepräsentation
Mereologie
Gruppenkeim
Euler-Diagramm
Projektive Ebene
Inhalt <Mathematik>
p-Block
Term
Computeranimation
Fraktalgeometrie
Bit
Subtraktion
Punkt
Wellenpaket
Selbst organisierendes System
Wort <Informatik>
Zahlenbereich
Quick-Sort
Computeranimation
Übergang
Metropolitan area network
SGML
Nominalskaliertes Merkmal
Menge
Determiniertheit <Informatik>
Gamecontroller
Inhalt <Mathematik>
Telekommunikation
TVD-Verfahren
Quader
Selbst organisierendes System
Wort <Informatik>
Beschreibungssprache
t-Test
Zahlenbereich
Bridge <Kommunikationstechnik>
Kartesische Koordinaten
Element <Mathematik>
Diskrete Gruppe
Extrempunkt
Hinterlegungsverfahren <Kryptologie>
Computeranimation
Übergang
Formale Semantik
Metropolitan area network
Elektronisches Buch
Domain-Name
Informationsmodellierung
Typentheorie
Datentyp
Vorlesung/Konferenz
Prinzip der gleichmäßigen Beschränktheit
Disjunktion <Logik>
Fokalpunkt
Quick-Sort
Entscheidungstheorie
Singularität <Mathematik>
Generizität
Projektive Ebene
Umwandlungsenthalpie
Bit
Subtraktion
Beschreibungssprache
Adressraum
Profil <Aerodynamik>
Kartesische Koordinaten
Physikalisches System
Term
Raum-Zeit
Framework <Informatik>
Whiteboard
Computeranimation
Modallogik
Forcing
Flächeninhalt
Vorlesung/Konferenz
Projektive Ebene
Inhalt <Mathematik>
Softwareentwickler
Telekommunikation
Bit
Decodierung
Prozess <Physik>
Wort <Informatik>
Güte der Anpassung
Versionsverwaltung
Kontextbezogenes System
Fokalpunkt
Term
Quick-Sort
Computeranimation
Gradient
Lemma <Logik>
Menge
Verschlingung
Eigentliche Abbildung
Mereologie
Programmbibliothek
Dateiformat
MIDI <Musikelektronik>
Projektive Ebene
Inhalt <Mathematik>
Versionsverwaltung
Bit
Hyperbelverfahren
Wort <Informatik>
Profil <Aerodynamik>
Systemaufruf
Term
Analysis
Quick-Sort
Computeranimation
Programmbibliothek
Vorlesung/Konferenz
Inhalt <Mathematik>
Maßerweiterung
Versionsverwaltung
Analysis
Formale Grammatik
Umwandlungsenthalpie
Bit
Stabilitätstheorie <Logik>
Punkt
Prozess <Physik>
Wort <Informatik>
Machsches Prinzip
Übergang
Transformation <Mathematik>
Term
Computeranimation
Mapping <Computergraphik>
Bildschirmmaske
Flächeninhalt
Rechter Winkel
Punkt
Projektive Ebene
Addition
Bit
Punkt
Ortsoperator
Relativitätstheorie
Physikalisches System
Komplex <Algebra>
Computeranimation
Invariante
Übergang
Summengleichung
Mapping <Computergraphik>
Bildschirmmaske
Flächeninhalt
Bus <Informatik>
Vorlesung/Konferenz
Maßerweiterung
Overhead <Kommunikationstechnik>
Bit
Subtraktion
Zahlenbereich
Ähnlichkeitsgeometrie
Kontextbezogenes System
Term
Computeranimation
Rechenschieber
Differential
Diagramm
Rückkopplung
Code
Vorlesung/Konferenz
Subtraktion
Multifunktion
Bit
Computeranimation
Metadaten
Informationsmodellierung
Flächeninhalt
Menge
Inhalt <Mathematik>
Analogieschluss
Hilfesystem
Data Mining
Gammafunktion
Analysis
Fehlermeldung
Expertensystem
Bildschirmmaske
Bit
Punkt
Freeware
Rechter Winkel
Selbst organisierendes System
Gruppenoperation
Dateiformat
Vorlesung/Konferenz
Physikalisches System

Metadaten

Formale Metadaten

Titel Living in a World of Book Tagging
Serientitel JATS-Con 2012
Teil 06
Anzahl der Teile 16
Autor Piez, Wendell
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/30572
Herausgeber River Valley TV
Erscheinungsjahr 2016
Sprache Englisch
Produktionsjahr 2012
Produktionsort Washington, D.C.

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...