We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

An Introduction To BITS

00:00

Formal Metadata

Title
An Introduction To BITS
Title of Series
Part Number
5
Number of Parts
16
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2012
Production PlaceWashington, D.C.

Content Metadata

Subject Area
Genre
Video gameFile archiverProjective planeTraffic reportingMultiplication signXMLLecture/ConferenceMeeting/Interview
Content (media)MetadataLatent heatGroup actionSinc functionSubject indexingProcess (computing)Table (information)Control flowBitDifferent (Kate Ryan album)Document Type DefinitionSelf-organizationGeneric programmingData structureEndliche ModelltheorieMereologySlide rulePerspective (visual)CodeSoftwareShared memoryWordStaff (military)Attribute grammarReal numberScanning tunneling microscopeComputer animation
Content (media)Document Type DefinitionEndliche ModelltheorieElectronic program guideFile formatAxiom of choiceMeeting/Interview
Element (mathematics)Connectivity (graph theory)Single-precision floating-point formatSet (mathematics)Descriptive statisticsFlow separationSubject indexingMeta elementMultiplicationFile archiverWrapper (data mining)Sheaf (mathematics)Electronic mailing listDifferenz <Mathematik>Figurate numberCore dumpMetadataMereologyPerspective (visual)Digital object identifierSeries (mathematics)Level (video gaming)Different (Kate Ryan album)Uniform boundedness principleComputer clusterEvent horizonNavigationGeneric programmingType theoryFitness functionGroup actionAuthorizationPointer (computer programming)Endliche ModelltheorieAbstractionArithmetic meanSequenceDocument Type DefinitionMetropolitan area networkText editorFundamental theorem of algebraHelixBuildingVideoconferencingRight angleRow (database)JSON
Subject indexingElement (mathematics)Letterpress printingTerm (mathematics)Lecture/Conference
Term (mathematics)QuicksortTheory of relativitySubject indexingLetterpress printingOperator (mathematics)Level (video gaming)Text miningComputer configurationBuildingInclusion mapRange (statistics)Multiplication signComputer animation
Term (mathematics)Endliche ModelltheorieMereologyMacro (computer science)Pointer (computer programming)Set (mathematics)Subject indexingRange (statistics)Right angleElement (mathematics)Point (geometry)CASE <Informatik>Broadcast programmingFile Transfer ProtocolGame controllerRootOperator (mathematics)SpacetimeGroup actionLine (geometry)Document Type DefinitionoutputType theoryLoop (music)Limit (category theory)Content (media)Price indexMathematicsMetadataBoundary value problemInstance (computer science)Sampling (statistics)Fraction (mathematics)Multiplication signWrapper (data mining)Library (computing)Latent heatMeta elementDifferent (Kate Ryan album)Insertion lossMobile appElectronic mailing listLecture/Conference
Transcript: English(auto-generated)
Okay, I'm Martin. As Bruce said, I work at the NCBI Bookshelf Project. We're an online archive for books and reports. We do cover the life sciences. We got just north of
1,400 titles in our collection. It's growing nicely, about 200 a year, maybe more next year. We exclusively use the NCBI Book DTD, converting mostly PDFs to it, but also other XML and we do have a word authoring program. So we're an archive, but we do online publishing
and we do know a little bit about XML authoring also. Bruce basically took my first slide, a very brief background on the NCBI Book Tagset, but that's okay. I'm going to introduce the bits actually. Going to run over some basics, going to talk about structure, metadata,
and then introduce TOC and the index elements, which I think are particularly exciting. Let's talk about what Bruce said. Books and journal articles are really not all that different. Well, blasphemy to some, trivial to others maybe. I looked around why people
like the NCBI Book DTD and here are some quotes from real publishers and it's really interesting to see. It seems like we were right back then. It's particularly interesting
to publishers who already have article XML. From a business perspective, incredibly convenient. Your staff knows it already. Your vendors may know it already. You can share software tools, infrastructure. So really I don't think it's surprising looking over these
quotes here why the NCBI Book DTD is actually interesting to people beyond NCBI and gained quite widespread use actually. I think every jazz conference has something about the Book DTD. So then why new DTD? Well, there were complaints out there. The most
common complaint, we don't have indexes, we can't take serious metadata, why is there no table of contents, why do you not have preference, why don't you have introduction or forward, things like this. A global complaint has been, well, just a real-tude article DTD.
They don't look at books in its own right. And the nice little process for the articles DTD offered a nice break to take another fresh look and really do beyond bookshelf,
beyond the original purpose, a broader review of the book model and make it useful for all publishers. Some basics, it's based on ISO jets. So you're going to get all the
goodies that are included in the latest ISO version, multi-language support, things like the specific use attribute. It's not part of the ISO process. It's generic. It's
especially but not only for organizations who already have article content in jets. The scope shouldn't come as a surprise. It's STM literature, scholarly professional books. If you have material that requires you to put a lot of effort in design and
form, it may not be the right choice for you. So cookbooks or travel guides is not the kind of content that the DTD intends to cover. So let's look at the book model.
Here we have a book. I don't know how many of you are familiar with the NCBI book pack set, but if you are, you're going to see some differences. Most noteworthy, right at the beginning, now there is collection meta for sets or series or anything that you define a
collection to be. Then we have book meta. Front meta, I think, is also interesting. You got new elements there, dedication, forward, preference. You also have a generic front meta part for anything else that doesn't fit the other elements.
You can type it. Book body has changed. It's now only a sequence of book parts. Again, this is for you who are already familiar with the NCBI book tag set. Back meta now includes an index and it includes book parts. This is also new.
Basically, what you see here is a typical model. It's quite close to other DTDs, front meta, body, back meta, but many new elements. Now, a book is very large,
so it must be possible to handle, exchange a single component as well. For example, a chapter. And for that, we came up with a new element, the book part wrapper. You can take basically any component, a chapter named book part, or an appendix, a front meta part,
or maybe just a TOC, join it together with the book meta and wrap it in a book part wrapper and send it off to your archive or to your customer. Previously, the book part played that role. It still exists. Again, for those of you who
are familiar with the old model, you don't have book meta there anymore, but you do have book part meta, front meta, body, and back. Few things about metadata. Collection meta is modeled quite similar to book meta.
It has specifics, but you're going to be able to put the title, editors, publication date publishers in there. It's from the perspective of the book. It's the metadata that travels with the book, so it looks at its collection parent,
and it does not include all the other book siblings that may be in the collection. I'm really excited about book part ID. This was also complained in some of the quotes that I showed. It was really difficult up till now to say mark up properly the DOI
or PubMed ID of a chapter, and for that we have the book part ID now. And what's also new is pub history at the book and at the chapter level. I think it's an interesting element. You're now able to model events in the publication history
together with the date. First published in 1997 or really anything that you'd like to model.
If you do not want to do that, you can still use only a series of dates in pub history. Accepted, received, revised, but I think this could be very useful to maybe unload
some of the meaning that we find on the date types these days and really have event descriptions there. Okay, we talked about the TOC in the journal matter. Maybe you want to use that indeed. It's now possible to tag in front matter and back matter
a single TOC or multiple TOCs. For example, the main TOC separate from the list of figures. You have TOC diff as an element, which will allow you to hold, for example,
all appendices together or chapters in part one. And at the core is the TOC entry element, which essentially consists of a title and a cross-reference, the nav pointer. Since we've seen authors on lots of TOCs, contrib group is also included.
You can add paragraphs, graphic material, even abstracts to the TOC model. I would suggest that journal people look into adapting this for their front matter DTD.
And whether you convert this from print or whether you generate the TOC from the main book XML is actually up to you. Lastly, the index element is brand new and equally interesting.
For wrappers, you essentially have the same situation as for TOCs. It's not listed here. You can have an index group, grouping together two separate indices. You also have index diffs, for example, if you have the alphabetic sections in your index and at the core, you have the index entry. You can nest them. For example,
Winston Churchill travels off just like you find it in print. You can also redirect terms with the C entry element. This is also typical in print. There is additionally a C
entry, so you can redirect to preferred terms or you can redirect to related terms. People should be excited about this. There were complaints that we don't have an index. Now we do. In the narrative of the document itself, you can embed the index terms.
This is where it actually really gets complicated. You can include redirects in the text. You can also include lower level terms. You can basically nest and nest and nest the index terms and have all sorts of secondary terms, related terms,
preferred terms in the main narrative of the document. You see here in the example also how you can tag the anchors in the text. So the index term with ID T1 is a range. It ends when they actually stop talking about Churchill in Cuba.
I think it's going to be interesting how this will be used. You have the option to just convert your print index, but this offers really interesting possibilities also for text mining and more complex operations. One idea was that the index term actually should enable
building an index on the fly from the narrative of the document. And I am under time. But to sum up, so we got a lot of new stuff, but we're also definitely faithful to all proven
principles. It's based on jets. It's content that matters, not form. We really want to have publishers who use the article XML in mind. And it's just a draft. I quickly put the jets list URL there after Laura's introduction.
So please look at the library and please comment and get a copy on the FTP site.
Questions? We'll take a few questions right now and we'll have more time for questions at the end. Why did you choose book part wrapper instead of reusing book part and extending that? Why do we use book part wrapper instead of using book part? Well, book part can be
included in the body of the book. So you have book part wrapper that's really an element that's nowhere else that has to be at the top. Couldn't you just make book part a root element? In another document? Just say that book part is okay to use as a root element for another document?
You can. One reason also was that if you do take a chapter and only a chapter and want to send it away, you do have to say which book it belongs to. So you do have to add the book meta. Everybody felt quite uneasy about having the book meta in the chapter
as opposed to next to it because we do model the book. What you're suggesting Chris is what we did in the old NCBI book BTD and it worked very well for some things and didn't work very well
for chapters that may appear in more than one book and you had you wanted to ship them around. If you wanted to send a book as a chapter or send a chapter as a part of more than one book, you'd have to rip the book XML out of the chapter and insert different book XML. Now you
can just include it in a different wrapper and your interchange is good to go. An appendix, for example. Say you have an appendix, you got app in back matter. You need to convert that to a book part of type appendix. You need to put the book meta in there. You need to change the
metadata, the title, to book part meta. So you have some operation there. Now you take the appendix, you wrap it in book part wrapper, you add the book meta, ship it off. Hi, Marilu Heppner, NCBI. About the indexes, I assume that you can have
overlapping indexes as well as multiple indexes as well as glossaries, glossary references for the same terms, right? Is it designed to handle all of those? You can have an index and you can
have a glossary. But can you also have multiple indexes? You can have multiple indices. And overlapping indexes. So for instance, a term and a phrase, maybe two parts of the index. And you could have overlapping indexes. Yes, you can. That's why the index term is separate
from the range. So the ranges can overlap because they're actually pointers, point pointers, they're not actually, they're milestones, not wrappers so that they can overlap,
which makes it a little more complex what allows you to do that. And what about multiple indexes? You have the same, you have the same, to save space, I didn't put the auto wrappers, but you do have index for one index. You do have index group where you can group several indices, places, people, subjects.
And if you're embedding the index terms in the prose rather than having them separate, you can identify which index a term or range is for. So you can have multiple indexes embedded as well as multiple indexes supported as pre-assembled documents. Okay, thank you.
Oh yeah, the thing I wanted to address was you said that this book model was for STM books. Not only. Yeah, I actually think it covers the, I'd characterize it as being for scholarly
and technical books. Yes. It's certainly not a model optimized for cookbooks or the TV guide, but it's for much, a greater subject range than just STM. It's for scholarly and professional literature. It's not only for STM. If I gave the impression though, it's not the case.
You do social science, it'll work. Ah, I didn't realize I had control of this. In the working group, we had discussions about what the scope should be. And while there are certain books that we put out of scope,
it's conceivable that you could do something like a high school textbook. Certainly, I think a secondary textbook would work reasonably well. Although, for example, there's no specific Q&A model in the DTD, so that might be a limitation. But we also deliberately put things like
K to eight textbooks out. So really, if you're interested in using the DTD, one of the things you should do is look at your content and then look at the DTD and see if it will map. We're not setting hard boundaries, because even more so with books than journals, it's very difficult to set firm boundaries. But do try it out, and please, please, please
give us feedback as to what you need. If, for example, you really need a Q&A model, let the working group know and send us samples of what you want in the Q&A model, because that's one of the things that we would consider.