We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Welcome and Keynote: (Packaged) Web Publication

00:00

Formal Metadata

Title
Welcome and Keynote: (Packaged) Web Publication
Title of Series
Number of Parts
16
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The publication of EPUB3 has been a major step forward digital publishing. Relying on Web Technologies like HTML, CSS, SVG, and others, EPUB3 offers a solid basis to publish not only digital books, but all sorts of digital publications in a portable, adaptable and accessible manner. However, it is possible to bring the publishing and the Web world even closer together, making the current format- and workflow-level separation between offline/portable and online (Web) document publishing eventually disappear. These should be merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management. W3C and IDPF have recently started to work towards this vision, exploring the technical challenges to make it a reality; this talk will give an overview on where we are.
Lecture/Conference
Lecture/Conference
Lecture/Conference
Lecture/Conference
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Lecture/ConferenceComputer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animationLecture/Conference
Computer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
I wonder what happened to Gloria?
Hello, hello, okay, microphone is working. Good morning, everybody, welcome to this year's SWIP.
Nice to see you all. I'm doing the moderation for the first session. My name is Adrian Pohl, I'm from HBZ. Yeah, we start with three welcome remarks. First, by Tom Baker, who is now representing our host,
Frieder Hiebert Stiftung. He actually has a past with SWIP. He had the keynote in 2011 and he participated in various other SWIP conferences. So, but today you only have a few minutes, not the 45 minutes you had for the keynote in 2011.
This is the eighth conference in the series on semantic web in libraries
and the second to be hosted here by the library of the Frieder Hiebert Stiftung in Bonn. The first two SWIP conferences in 2009-2010 created a community around semantic web technologies in libraries here in Germany. The driving forces were Haber Tzet, represented here today by Silke Schomburg,
and TzetVV, the Leibniz Information Center for Economics, represented by Professor Klaus Tochtemann. SWIP was the right idea at the right time and the event quickly became international.
After the amount of linked data relevant to libraries grew by an estimated 1,000% in just one year, SWIP 2012 drew 170 participants from 23 countries. SWIP has established itself as one of the liveliest gatherings of semantic web innovators in the library world. It showcases projects from many countries, many of which involve international cooperation.
Here at the Frieder Hiebert Stiftung, the library is preparing to publish its subject headings with persistent URIs and as linked open data, subject headings which reflect the focus of its special collections on the German and international labor movement,
social history, political parties, and trade unions. The library, which functions as the global repository for all printed and digital publications of the Frieder Hiebert Stiftung, with offices and projects in over 100 countries,
is also looking to the colleagues assembled here today for inspiration and ideas on how we might use semantic web technologies to improve access to these publications and reach new users. Promotion of democracy has been the mission of Frieder Hiebert Stiftung since its foundation in 1925,
first and foremost with its scholarship program, which was shut down by the Nazis. And then in the decades after 1945, with its programs for adult education and political analysis within Germany and in the form of international cooperation through its offices across Europe,
the Americas, Africa, the Middle East, and Asia Pacific. Closer to home, the Frieder Hiebert Stiftung welcomes the movement towards transparency and open access to information in local government. As exemplified by the open North Rhine Westphalia, a project of the German Social Democratic Party and the Greens.
The rise of populism in North America and here in Europe is but one reminder why we cannot take our historical memory for granted. The library looks to the community of innovators represented here today
for partners in building a more robust information future. So we are delighted to receive you once again into this house. Our social and networking activities will be supported by the house catering services. And anyone wishing a tour of the library is welcome to stop by the conference desk.
We hope you all enjoy the conference. Thank you, Tom. The North Rhine Westphalian Library Service Center, as Tom already said it,
is one of the two organizations that organize and fund the SWIP conference since its inception in 2009. And representing the HBS, now here is my boss, Silke Schomburg, giving the second welcome remarks.
Ladies and gentlemen, dear colleagues, it's my great pleasure to welcome you all to SWIP 16 here in Bonn. We are happy to have you here participating and sharing in this year's SWIP hosted by HBW and HBZ.
I would like to give my thanks to the program committee for arranging such a great agenda and to all those who submitted contributions to the conference. Thanks also to the workshop leaders who made yesterday an instructive experience for many of us.
Finally, we'd also like to extend a warm thanks to the organizing committee from the ZBW, HBZ, and also Friedrich Ebert Library for preparing the venue and helping us to have a smooth conference experience.
After discussing web standards for all those years, this year's SWIP conference proudly welcomes a representative of the World Wide Web Consortium, Eva Herrmann, the digital publishing activity leader of W3C.
Eva will shortly be speaking about efforts to harmonize the creation processes and formats of online and offline documents with EPUB 3. Technology isn't everything. Community is the key.
Although conferences like SWIP are important for bringing together communities, they don't ensure a long-term connection. Tomorrow, Andromeda Yilden, a distinguished speaker on open source programming, community, and culture, as well as president-elect of the Library and Information Technology Association, LITA,
will give a keynote about creating long-term communities for building open source software. We are delighted to have Andromeda here. As in the last year, SWIP 16 offers different ways for participants to engage and be part of the program.
Besides exchanging experiences during the long lunch breaks, everybody is invited to hold a lightning talk or even to offer a breakout session.
Thank you all for coming. Please feel invited to get involved, to discuss, and to learn together with us. Thank you.
Thank you, Zuike. Now the other organization funding SWIP is the ZBW, the German National Library of Economics, and representing the ZBW, now speaking, Klaus Dochdrammer. Good morning, everybody, and welcome here to SWIP 2016.
Since its inception in 2009, SWIP has always been reflecting current trends and topics in the world of semantic technologies, linked open data, and libraries. But the key achievement, I think, is that SWIP managed to integrate
semantic technologies into library technologies. And this was a very, very long path to go. Yesterday evening, I prowsed through the web pages of SWIP and I prowsed through the different programs starting in 2009.
And in 2009, the key questions addressed during SWIP were really basic questions. Nobody would talk about that these days anymore, so the questions were, what actually is the semantic web? What actually is linked data? And how can libraries benefit from semantic technologies and linked library data?
That was seven years ago. Like, two years later in 2011, the key questions were still on basic principles and technologies of linked open data and library technologies. For example, the questions addressed in 2011 was,
how can we as libraries get involved into this fascinating movement? How can we publish our own catalog data in the linked open data cloud, and what technologies and infrastructures do we need? I guess in 2011, we had many discussions about these triple stores,
which one is the most performant one, like Sesame and others, under which circumstances should we host it in our own IT infrastructure or somewhere else. And then 2013, the topics became more sophisticated. We were discussing about SCOS, about knowledge authorization systems,
simple knowledge object systems, data integration, and for the first time, about open data licenses, because it was a pressing issue to become clear under which license should we publish our catalog data in the linked open data cloud. Still, even in 2013, the topics were more on setting the scene for applications.
The more I'm happy that now in 2016, we have reached the point at which we can discuss more about applications, because the applications, they bring, you know, the linked open data to our customers,
to the users of the libraries, and they generate benefit to our users, and this must be the overall goal of our activities in linked open data. So this year, we talk about quality of metadata. That is a big issue also in the ZBW. The key question is always cost ratio benefit.
We will talk about sophisticated search interfaces. Not to everybody outside this community searching in the common search.
So we design new interfaces which allow very simple queries using linked open data triple sets. And finally, of course, since all the information is networked with one another, visualization becomes a new emerging trend in linked open data.
For the first time, we can, you know, build up networks showing how different catalog sets are connected with one another. So this is good news because now, after seven years, we have reached the point at which we can discuss about the applications, and I hope that this conference will be the starting point
for really pushing out all the solutions and the technologies we have developed over the last six, seven years. Before I close, I would like to thank the organizers, mainly Adrian Paul and Joachim Neubert, where is he?
Still working outside organizing something. So Joachim, thank you very much for your efforts, and also, of course, the colleagues of you who were involved into the organization of this event. Many thanks go also to the program committee and the presenters here during the next two days, and of course, thank you very much to Friedrich Ebert Stiftung
for being the host of this year's conference. I wish you an inspiring and interesting conference with plenty of interesting talks and enough space and room for networking and discussions. Thank you very much.
Thank you. Okay, after this round of openings, we start with the first keynote by Ivan Herman. Ivan started as a mathematician and then decided to do other work in programming,
first focusing on the field of computer graphics. Since 2001, Ivan has been part of the World Wide Web Consortium staff, first as head of office, and from 2006 to 2013, Ivan was lead of the W3C Semantic Web activity.
Since 2013, he has been technical lead of the digital publishing work at W3C, and we are happy that Ivan could make it to talk about the work in this activity and the stage is yours.
Thank you. I must, oh, let me see, do that properly. So I must say when I was invited to come here, then I was a bit surprised because,
yes, I had my past in Semantic Web, but I haven't worked in Semantic Web for about three, four years, so it was not clear why I was invited. Sorry. But, you know, thinking about it, what I talk about is digital publishing, and when you are in libraries, digital publishing,
you cannot avoid that. For a long time, many libraries avoided to touch electronic books, but I don't think that this can be done for long. And therefore, anything that's related to digital publishing and the way it fits on web standards in general is relevant, and that's what I will talk about.
All these slides are in HTML. They are on the web, and you don't have to write it down right now because I will put up the URL again at the end, so just to know. So today, the standard for digital books is essentially EPUB 3.
There is a company in Seattle with bearing the name of a Southern Macon River, which does something of its own, but apart from that one, it's really EPUB 3 wherever you see electronic books. And it's a very mature one. It's used for digital publication. It's really non-proprietary format.
It's really the standard format for digital books. But it can also be used and is used for journals, magazines, and other things. It's not only for books in spite of what people believe. And a large variety of books can be created today. This is just a screen dump of my iPad
with part of my own collection. Well, you can see the children books that influenced me a long time ago. So books with all kinds of drawings and illustrations and actually very faithfully reproducing what you have in the printed books if you know the originals. You can have scientific presentations
using mathematics or other things. You can have art books. All these are simply dumps of books that I have on my iPad or elsewhere. For Japanese or Korean or Chinese, the mangas is an enormously big and important market. So that, again, you can have technical books
of different kinds, which are all in EPUB. And, of course, you can use all kinds of character sets. The right-hand side, which is in Hebrew, is even more interesting because if you have ever tried to typeset HTML pages with right-to-left languages and you want to mix them with left-to-right,
then you will have a hard time. But this is all there. And it's not only for books, as I said. You can use it for conference proceedings. This is actually a link data proceedings that Springer published, and it's available in EPUB. You can have each journal, so magazines now, okay?
This journal is the Journal of Electronic Publishing, so they sort of have to do it in EPUB. But this is something that is coming to the fore as well. Not only the journal, each individual article is separately available in EPUB. This is the annual report of the European Commission.
I try to be precise. They have to come up with a report every year on what they produce and what they do. And it's also available in EPUB, among other formats. And in fact, these days, more and more so that you can produce just about anything. The left side is a screen dump of Google Doc,
and you can store any document you produce. You can store it in EPUB. The right-hand side is the same for Apple's Pages, which is its own text processing package. As far as I know, there are extensions that you can install for LibreOffice, and I have no idea whether MS Word does it.
But I wouldn't be surprised if they did, because, for example, Edge has just announced, I think, two weeks ago or ten days ago, that the next release of Edge will have EPUB reading built into the browser. So these things are coming.
So what is the secret of this success? Well, one of the secrets is that EPUB internally is entirely based on the technologies that have been developed at W3C. To be very clear, EPUB has not been defined by W3C. It's done by a sister organization,
which is called IDPF. But they make use of HTML, CSS, SVG. They make use of various APIs that are available for JavaScript, et cetera. So they make use of what, at W3C, we call the open web platform. So it is, in some sense, an extension of what's then there.
So they did not have to define anything related to rendering, to formatting, to, you know, how things look on the screen. They could rely on the specification of what W3C has done. And roughly, the structure of an EPUB file
is simply a bunch of administrative files that you have to have, like table of contents and things like that. The rest is website content, HTML and CSS and pink images and maybe JavaScript, and the whole thing is put into, essentially, a zip file with some restrictions.
So that's what an electronic book is all about. What you gain is quite a lot. If I take this example on the right-hand side, you can see the photo of a tiger and there is a text which is overlaid and the background image, I mean, the tiger part is sort of blurred.
There is transparency involved, all kinds of things like that, which is simply done because all these things exist in CSS. And what they use in the book is using CSS, is reusing CSS. So there is nothing else involved. Well, the book itself is on tricks of CSS,
so they really make use of advanced features of CSS. But that's fine, that's the way it works and it works very well. So, EPUB and digital publishing in general is very, very related to what's happening on the web and it's really fundamental.
So the question that may come, are we all done? There is a marriage of books or digital books and the web platform, everything is fine and we can go home. Of course, it's never like that. There are two major areas of work
that needs further work. And what I would say is bring the basic technologies and digital publishing closer and on a slightly higher level, bringing the web as a whole and digital publishing closer. I will just spend a few minutes on the first one,
is bring the basic technology level and digital publishing closer. Because there are many things that publishing does need and it's not yet really available. The most glaring example is pagination. On the web, you don't have this concept and if you have digital books,
it's absolutely necessary to have pagination and how you do that and headers and footers and all kinds of things that publishers know about. And then there are other things like if you do mangas, then you want to have all kinds of nice transition among the images. The mangas themselves can be done in SVG but how do you transition from one SVG to the other
would require quite a lot of things which is not there. If you have very complex document, HTML will give you a certain level of information on a section and header, et cetera, but if you have a real publication, you need much more about the structure itself and the question is how would you express that properly.
And there are all kinds of typesetting requirements that we here in general, we may not know in Europe, but from Asia through India through, I don't know, South American indigenous cultures, they have their own typesetting requirements and traditions
that has to be reproduced and these are all problems. There is a digital publishing interest group at W3C that tries to gather these kind of issues and sort of make the bridges to the various groups, like let's say the CSS group, to put forward those issues that have to be solved and that's an ongoing thing.
There are also a number of other things that happen on the open web platform and the digital publishing world has not really made use of it yet. There are a number of dynamic libraries on the web which are not really used. You may want to use external services
if you think about educational world and you want to get to educational services, web authentication and payment issues when it comes to selling books or reading books or accessing books, let's say, in a library. These are all things that are happening that the digital publishing world has not really taken up yet,
so there are also things that have to happen in that area. But these things should and will happen. It's all kinds of details that have to be settled but there is no fundamental issue. What's in a way more interesting is to bring the web in general
closer to digital publishing. What I mean by that, a book in general is to transmit ideas from the authors or the editors to the readers in various formats
and it should be independent of the format. So the book, it should be available on various media, books, web pages, printed books and today this is not the case. Today, digital publishing and the web are two different worlds that very rarely talk to one another.
Let me give you an example. Get back to this European Commission report. This is the dump of the web page that you get when you go there and as you can see, there are those logos. You can download the book in EPUB and PDF and maybe other formats.
That's fine. That's what they do. And you get a very traditional and, sorry to say, pretty boring page. There is a photo of Juncker, of course, and some text and that's what you get in EPUB. On the other hand, if on the same web page you begin to scroll down
then suddenly it looks a little bit different. Whether the content is growing or not, I don't go there but at least the outlook is a bit different. The left-hand side, you can choose. I didn't play it here but if you go to the lower right-hand side then you can go into more details or you don't want details
so there is an interaction involved. And so the web part of it is much more lively than the digital book side and that should not be the case. It's the same message that I want to give to the reader. It should be independent of what format I use.
So what this should not be the case really means is that the separation between publishing, digital publishing and as website as well as online versus offline all these differences should be diminished to essentially zero. It should be the same thing.
That means it should be possible to take a digital book or a digital publishing into the web on the web page and display it and enjoy it. It should be possible to take the book offline or online because obviously you are offline sometimes and it should be the same book, not some sort of a conversion of what you have
and the author should be able to author the book regardless of how and where it will be used. All this sounds nice but why would I spend my time on that? Let's take some examples here. This is the PhD of one of my former colleagues
which was originally published in HTML as well as in printed book and turned also into EPUB. On the desktop, on my machine if I read it then I want to read it as a web page because there may be links there that I want to follow easily
I want to be able to put bookmarks within the books anywhere not only on the page but on a specific section for example. I may have a number of additions to my browsers add-ons or extensions depending on what browser you use
and I want to make use of those for example to create annotations and sometimes, not necessarily for this book but sometimes you may need other things and more computing power that you have on your laptop that you want to use, for example for text analysis. Other times, I want to put it on a reader device
and read the book on the beach not this one, it's a very juridical text you don't want that, but anyway you want to read it on the beach and it should be the same book not converting one format to the other like I had to do for this one in practice but I want the same content to be available all the time.
I am not always offline it's still true that when I come to Germany then I roam and which means that I pay much higher price for my phone so I avoid roaming obviously so I am offline and a lot of people are but when I read something when I'm offline or I create annotations
I want those annotations to go back to the internet when I am on the internet and do that without me dealing with it too much so more or less automatically Scholarly publishing has changed a lot these days
papers may be published online and today, at least in our areas, it usually are and people may want to download it but today you get a PDF page and try to read a PDF page on a phone it's a big problem so the format of the paper should be much better for that
you know, know these two columns ACM type, typesetting of articles that we all have and of course today a paper may include data it may include video, audio, programs, you name it this is all more and more part of a digital scholarly publication you want to be able to handle that
so then you can put the things the other way around and there are people who say that you know, forget about EPUB, digital publishing altogether we have the web and the web can do everything so the web provides all finished we can go home
well, it's not really the case, not quite there are a lot of things that the web can do but even if I consider things only as web pages and web content and forget about the fact that I want to package it into a file it's not always there the fact is that when I talk about the concept of a publication
a publication consists of many many resources of text, images, videos on the web these are sort of linked together and magically they work in a browser but when I talk about a publication
which has all kinds of technical as well as social constraints around it then I want to have a single unit on the web which somehow represents the whole publication the whole collection of resources it's not only the CSS and SVG files it's also because for practical reasons
if you have a very large book then you will have that book cut into 15 different HTML pages but they all together give you the concept of a publishing the concept of the book and not as individual entities so we need something that we call which represents the whole content as ones
and there are a number of use cases we collected a bunch of use cases and there are some of them here the whole book should be archived in a national library you are partially librarians you know what it means and how important that is the publication has to be checked for integrity let's say for legal publication
you want to share a publication with your friend you have to know what you take and take to the other side to be able to do that and there are a bunch of business models that in fact rely on the fact that publication is one entity there are also technical things as I said a large book can be cut into many chapters
and technically this means cutting too many HTML files nevertheless you want a bunch of counters like section numberings or footnote numberings etc to go continuously through the whole publication you want search, text analysis and all kinds of external tools like that to be valid for the publication as a whole
and if you can set user preferences like font size again you want to do it for the whole thing and not for each chapter individually technically speaking it sounds relatively simple the details, the evil is in the detail
currently as I said the web builds on the notion of a single resource and you have pointers to each of those HTML, CSS, other things and the rendering is based on the interlinking of these things and it's as I said magically happen what you need here that on the web you have to have a single web resource
that somehow represents the whole publication that means a simple web resource and on the web that means something that has a clear unique URI that represents the book and not the individual chapters and by that you make the web publication part of the web
and give it a necessary identity you also need something which for the time being we coined as a kind of an abstract entity WP is a web publication processor a kind of a conceptual client side processor that can be part of the browser
either built in or extension or whatever that performs something that are specific to a web publication it knows how to find the individual resources and then put them together and pull them together and takes care of the section numbering issues for example and it also hides the fact that sometimes
I want to read the web publication offline or online and there is also that is more a discussion that we have if you go down deeper sorry if it's a little bit too technical today on the JavaScript programming level you have access to an individual publication
through what we call as a DOM there is no abstraction on the programming level for a collection of DOMs a collection of documents so there may be a need even down to the API level to have something that represents this I must admit this is not everybody agrees with that so these are all discussion points
so of course there are architectural challenges to be able to do that now one of them is handling online and offline that sounds, and you will see it's not that bad but it sounds like a major thing so this is what happens usually you go out to the web
buy an HTTP request you get back the result you displayed that's essentially, it's simple that's all the web is the envisaged architecture is that you have some sort of a local cache in the browser and that if the browser itself tries to get to a page
then somebody somewhere namely the web publication processor looks whether that page is already in the cache or not and if it's in the cache that's what it will display and otherwise if it's really necessary it will get it from the cache and if you happen to be offline and everything is on the cache no problem, you have it there
now the question of course is it absolutely feasible? well, not only it is feasible but it's happening regardless of what public shares do there is something in the publishing in the web world which is called the web workers and the service workers this is a new technology that's coming to the fore
a web worker is a truly parallel thread within a browser so not tricky things like in JavaScript but really parallel threads and a service worker what it does is essentially what I just said it can cache local content it acts like kind of a network proxy
so if you send out a request to a URI it will catch it it will check whether it's in local cache if it's there it will give it back, etc. so it essentially does what I described what is necessary for online offline it is work in progress so it's not something that it's already fixing
but what is already true that a WB processor can be implemented using service workers there are details to check so it's not 100% but we are getting there and service workers are coming it's a living document which is out there and a number of browsers are already heading towards service workers
the latest version of Chrome and Firefox and these are all having service workers as part of the implementations already so it's something that's happening so I hope I have time for that
an example I am offline now I made it clear to be really offline but this is a book that I read I looked at before and it can be done there with HTML using service workers so I have it offline on my machine I don't want to go back to the
non-full screen but if you look at the URL up there it's not a local URL and it's already there simply because service worker is part of Firefox and I use Firefox to display that and I could have done that in Chrome as well you also need all kinds of information about web publication
and you guys know that very well that you need lots of different metadata in various formats and various syntaxes you have to have the list of resources which is out there which is important because you have to be able to handle it for table of contents and things like that
you have to have unique identifiers again something that you know much more about than I do and you have to have references to very complex metadata that book publishers and digital book publishers use there is an ONIX metadata specification which is a huge XML specification that they use, etc
so there is a necessity for something that we call manifest which describes this information in one place in some well-specified syntax and at some point in time these manifests will have to be specified the format, the content, etc
now we also need the packaging we cannot avoid having the packaging because in many cases packaging is necessary there are use cases like the scholarly publication where actually if you think about it packaging is not really important
if offline and online is given I think researchers usually would be happy with that maybe not for full books but certainly for articles but there are a number of other places where you need the possibility to put everything into one package the archival is a use case which comes up
which is a usual and obvious thing and of course the whole business of publishing is based on that we cannot forget about an existing business from one day to the other but after all this is just a layer on the top of web publication it simply means that you put everything that you have as a web publication
all the resources, everything you put it into one package you have to have some additional issues solved where would you find the manifest what archival format you use, etc but none of these are really heavy issues these just have to be done and if you remember the structure of the EPUB file
that's today's structure if you put it in the packaging of a web publication well it's a very similar structure probably the administrative parts which in the current EPUB is various types of XML dialects well these days XML has turned into a dirty word
for web application developers so everything has to be in JSON well then we use JSON and there are some things like that that have to be changed maybe we have to look again at what syntaxes are used for let's say storing RDF data because nobody wants to use RDF XML anymore, etc
but these are all again details that can be solved and what you would get is essentially the same structure the administrative files look a little bit different maybe the packaging format is different because it's better prepared for streaming again none of these are very very fundamental issues
and you would need a processor that has also the packaging part in it so it's a kind of an extension of what we have already for web publications and it has to do with the packaging and unpackaging on the fly so if you remember the figure that I already had
it's a very small extension there that does packaging and unpackaging and if I use zip let's say zipping and unzipping in JavaScript is an existing thing it's just adding one more library to it there is a different kind of technical challenge which is addressing identification
something that you guys again know very well we have to realise and when you talk about it with web developers you would be surprised that for them the distinction is not always that clear that addressing and identification are very different things
the two rules are very different you may use some photo URI to uniquely identify its resource but that may not be the same as the address you use to access that resource on the web page and these two things must be separated very clearly
and in many cases they do coincide but when people use a DOI or an ISBN or ISSN or whatever else that's not necessarily the same as the address they use now our approach is the W3C should not get into the business of identifiers
there are already many competing organisations handling that for the good or the bad but they do it and that community doesn't need one more organisation doing that so we keep away from that and what we have to concentrate on the locators on the web and leave the identification behind one of the issues that for example we will have to solve and there are others
is I said that web publication has a URI then you have to answer the question if I dereference that URI via HTTP what do I get and there are various alternatives, various things that we have to look at we may get back, let's say if it's a package one
then I get back the whole package where the manifest is somewhere I can have an HTML file, let's say the table of content with a link to the manifest and the link can be through a link element through a link header, an embedded
so there are various approaches that can be taken and that has to be specified on what and how one answer might be, and we had that for a while that actually it could return the manifest file itself the JSON file itself the web browser people push back on that very strongly
because their approaches, whatever you get back should be displayable right away in a browser and the JSON file is not so these are communities that have to find some agreement and there are also differences in terms of ergonomy that we have to look at let's come back to the pagination
if I take the War and Peace which, I don't know, 5,000 pages, 6,000 pages, I mean long obviously you cut it into chapters and when you read it then having pagination in some way or other is a very helpful ergonomic thing
something that the web, traditional web doesn't realize and we have to develop that so that's something that has to be solved but also not only for books, even for these slides these slides are done by a tricky HTML file sorry, a tricky JavaScript file
in the background on an HTML thing but it's some sort of pagination which should be done much more naturally than by some JavaScript tricks so, again another ergonomic thing which is interesting and is different is personalization the traditions are very different
if you take any digital publishing, any digital reader, e-reader then you have facilities like changing the background color changing the font, the font size, things like that which are all there, all of them do it at least all that I know do it and these are things that you cannot do on a webpage
even if it's just a document if you cannot change certain things on let's say Gmail I understand that because it's very well specified on what goes where but if I just read a long text and I want to text change the font size I cannot do it because in the web browser world this doesn't exist so these things have to be harmonized
as well, we have to reconcile these traditions now, what about EPUB in all that? so the package web publication today do not replace EPUB 3 simply because everything that I am talking about is some sort of a general plan of where we want to go
it is a future evolution a vision of where we should go and how it will exactly relate to a future version of EPUB to be very frank, nobody knows that so this is something that we have to define it's a kind of a future thing but we have to realize that from the publisher's point of view
many things are the same from a publisher's point of view most of the work is I get the content, I put it in HTML file I give it a decent outlook or a very good outlook using HTML and CSS and that doesn't change
it's the surrounding things the administrative things that would change we do not touch the content here we are talking about the same thing and that's very important to remember now, as I said, EPUB was developed at IDPF and the web technologies, as you know, have been developed in W3C
the two organizations have a very strong cooperation since 2013 and that's how this separate digital publishing interest group has been set up many of the ideas around this package web publication have been developed in that interest group
and actually we are still working on gathering the use cases and set up a kind of a general framework where we can do that technically the plan is that starting hopefully January at least certainly in 2017 IDPF and W3C would merge
there has been a number of votes among IDPF members and W3C members agreement and now the two organizations are working on a memorandum of understanding on how this works exactly and we have to work out a bunch of legal and technical details but the idea is that the two organizations would work
and then the development of all that towards a real standard level would happen in a working group or maybe more several working groups at W3C starting sometimes next year Let me, I still have I think five minutes, right?
Okay, good So some more words which might be interesting is if the publishing world and the web world sort of converge much more there are a number of other areas where the synergy is really interesting So for the publishers community one would ask what does that bring them?
Well, as I said, the main interest of the publishers is to produce content is to produce great content and today the publishers undergo a change in so far as getting technical people as part of the team and work on technical issues
but it's a long and tedious process for publishers by bringing these two communities together you know, they get sort of excess in inverted comma to a large community that is already very lively in the web developer space
So that's really important because you know, they can have much more resources to do what they need in the terms of digital publishing and they also get a larger audience because things getting on the web always gives you a larger audience It also avoids reinventing wheels
There are a number of developments in W3C that today are not used I already hinted at some of them at the beginning that are not used by publishers but mainly this happens then they can use it for their various developments and all kinds of APIs
So I listed some APIs that are being defined at W3C as we speak on, you know, databases, battery status, geolocation, you name it the kind of other things are happening that let's say a digital book reader, specialized reader, or specialized software
can really reuse for their own purposes maybe even as part of a new type of publication So there are features there that can trickle down into publishing which might be very interesting and that's, you know, that's an advantage for them I think that the web community can also learn a lot
because the publishers have accumulated a knowledge over the centuries on things that, frankly, web developers and website publishers or website designers do not know on issues of ergonomy, on aesthetics, on typography
when you look at an average web page from a typographic point of view or from a designer point of view well, many of them are not all that great so there is an expertise we can have and there is also an experience in all the work flow which is necessary to get from the authors down to the publication
It's also a tedious process that publishers know how to handle certainly more and better than many website developers This dump is also a very interesting one I don't know, some of you may know that It's a person called Oliver Byan
who recreated Euclides' elements by putting sort of graphical proofs into their as a replacement of the abstract mathematical proofs
that we usually learn about at universities So these are very precise mathematical proofs but experimenting with it in colours and shapes and all that was done in the beginning of the 19th century So this kind of experimentation has been there in publishing and it can be useful for anybody else
So to conclude, let us create a real web publication on the web That's where we want to go As I said, there is a use case and requiring draft which is up there
It's under development and there is also a draft that we work on on trying to get the technical consequences and relationships If you can see that, all the URIs are in GitHub which shows that they are constantly evolving So the last update I made on the PWP editor was yesterday evening
So that's the way it works, it's really cutting edge But that's about it One more thing This is not how development works at W3C It's not like there are three or four wise people sitting somewhere and defining everything and everybody else is looking
The work is done in meetings It's messy, it's tiring, it's dirty But that's the way it works So join it Join the work because that's the only way it can happen Otherwise it's just theoretical talking
Thank you very much for your attention Thank you Ivan Any questions?
You didn't mention anything about annotation Is there anything already thought about that?
I have only 45 minutes There is a work going on at W3C on defining what we call the annotation model If everything works as I planned the recommendation should be published early February So it's relatively new but practically done
Now what this does is define the basic data structure and the protocols that can be used to exchange annotation among servers and clients There is a layer up there which is all the user interface around it, all the systems around it
and we don't define that at W3C Usually we don't Usually we let the marketplace use that basic technology We ensure an interoperability level among the annotations Well, what we ensure is that that should be possible to do an interoperability among servers From the publication point of view
what we would like to see and there is a real need for that is if we have setups of a server with annotation then any environment that has you know, that uses web publication the user can annotate things and those annotations go to an independent
dependent server, then you can move with that publication wherever you are. As I said, if you are offline, this thing should work while you are offline, and when it goes online, it goes back to the server. This is all future. We hope that it will happen. There are already systems that do fundamentally this way.
It's just that they are not deployed that openly, so there's a small company called Hypothesis, that distributes a kind of a plug-in or extension to make annotation. For the time being, the annotations are stored on the cloud in their own servers, but as
far as I know, there are plans that you can install your own thing, if you like, using their technology, so you have your own cloud where you put your own server, and then suddenly what I describe becomes much more possible.
Okay, that sounds nice. I wonder how all of this publication structure can be related to the semantic web, especially
in terms of addressing the whole publication, maybe relatively easy, but how to address parts of the document, how to embed data into these publications. Did you discuss that also in the working group?
So there are several aspects of that. First of all, in terms of a web publication, any web resource can be part of the publication. So if the authors or the editors or the publishers choose to do it, then they can
get any kind of linked data, any kind of metadata, any kind of RDF data into the publication itself and it becomes integral part of it. So in a sense, it opens the door for something like that, and it remains the issue for
the linked data community whether they want to make use of it. But there is one more thing which is probably, I would say, more important. Today you have all the digital books that are out there and there are lots of them which are completely outside of the world of linked data because they are just considered
to be opaque files that are there. If this integration happens, then what happens is that these publications become integral part of the web. Now, whether there are enough data collected for linked data purposes from the web, from
web resources, you can have a debate about that, but that's a separate debate. But the point is that publications becoming part of the web, potentially they become the subject of anything related to that. And of course, what already happens, there has been discussion with the latest releases
of EPUB 3 on what happens if I want to add structural metadata to the various publications. The older version of EPUB 2 and even EPUB 3.0, they were created, some of you may
remember that, when there was all this war between should it be RDFA, should it be microdata, there wasn't JSON-LD yet, schema.org was, you know, the big, big threat or big I don't know, nobody knew what it was.
And so the IDPF community back then said, ooh, we don't get into this war. So they didn't really, you know, got there. They do talk, I think, about RDFA, but I'm not even sure about that. Now, today, you might consider this war being over, maybe, but even already in EPUB 3.1,
what they say, if you want to use microdata or RDFA, be my guest, do it. You can add this information into their HTML content or SVG content for RDFA, and that's already there. And if it becomes as part of the web, then there is no difference between an average web
page. Okay. We have time for one more question. Anybody?
No. Otherwise, we have more time in the coffee break. I am here until four, sort of, four o'clock this afternoon, so if anybody wants question, I'm happy to answer. Okay. And I have two things to say before we leave to the coffee break. In the afternoon, we will have lightning talks.
If you are interested in presenting a lightning talk, it will be around three to five minutes. Please put the title of the talk in your name.