We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

From Pixels and Minds to the Mathematical Knowledge in a Digital Library

00:00

Formale Metadaten

Titel
From Pixels and Minds to the Mathematical Knowledge in a Digital Library
Untertitel
Learning from one project
Serientitel
Teil
2
Anzahl der Teile
14
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
MathematikProjektive EbeneTermVier
EvolutionsstrategieOptimierungStabStereometrieTransformation <Mathematik>KombinatorGrenzschichtablösungGesetz <Physik>ÜbergangAggregatzustandArbeit <Physik>Arithmetisches MittelDivisionEinfach zusammenhängender RaumGarbentheorieGeradeGerichteter GraphGrothendieck-TopologieGrundraumGruppenoperationInhalt <Mathematik>Inverser LimesKomplex <Algebra>KoordinatenMereologieOrientierung <Mathematik>Physikalisches SystemPhysikalismusProjektive EbeneStellenringTabelleTermThermodynamisches SystemFlächeninhaltGüte der AnpassungNichtlineares GleichungssystemFamilie <Mathematik>ÄhnlichkeitsgeometrieSpannweite <Stochastik>Stochastische AbhängigkeitNichtlinearer OperatorBasis <Mathematik>MathematikerinCoxeter-GruppePunktKörper <Algebra>Klasse <Mathematik>Varietät <Mathematik>Fokalpunktt-TestVollständiger VerbandSchätzfunktionEuler-WinkelVorzeichen <Mathematik>Explosion <Stochastik>DifferenteMultiplikationsoperatorSchlussregelTVD-VerfahrenRechter WinkelOrtsoperatorSpezifisches VolumenEinsAlgebraische StrukturMathematikPhasenumwandlungDivergente ReiheResultanteZentralisatorAbstandTurm <Mathematik>StandardabweichungComputeranimation
Transkript: Englisch(automatisch erzeugt)
The aim of my talk is to give a brief overview of one project which aimed in digitization of some portion of mathematical literature. When I say digitization, this is not really correct because our aim was much wider. Our aim was and is to not only digitize the literature but to create a full-featured
mathematical library with all features which we would expect. This is a big task and actually I will give some overview, not about the technical details of our work,
but rather of some steps which we had to do, which in general have to be done in every such effort and to mention some conclusions which came out from our effort.
So, what was our motivation? Our motivation of course was to follow some initiatives which were applied for a couple of years but soon we recognized that there are many such initiatives which are not bind together.
The approach is very high. Each initiative has a different level of elaboration from very simple to very complex, like Noom Dam and similar projects. The initiatives have different presentations.
And what was surprising for us, and this was a bad surprise, that actually there is no big cooperation between these initiatives, so everybody is meeting his own path individually crossing the river his own way. So that was in fact our way too.
There is no detailed, or not too detailed, but actually no instruction manual how to do, how to create a digital mathematical library. So the question is of course, are we all together building a new tower or bevel?
Of course we are not, because that's why we are here, we share the experience, but still I have the feeling that maybe a lot of work could not be done if we communicated earlier. What are the requirements for such a project? Manpower, this is the most important of course.
Expertise, equipment, time, this is very important, it's also surprising for us how much time does it take, and money. The last one, money, was the obstacle which prevented our Czech activity for a couple of years by starting this activity,
because we had no sources to do it, and we tried to take part in several activities which were prepared on the European level, under the auspices of the European Mathematical Society, unfortunately several such attempts failed.
The last one is now under evaluation, so let us hope that it will work this time, but after such several attempts we saw that there will be in the near future no funds from the EU, from outside.
So we looked for some possibilities inside the Czech Republic, and we were very lucky that in 2005 the Academy of Sciences opened a research and development program, Information Society, in which such type of project fitted. So we started in 2005 the project of the NLCZ, the Czech Mathematics Library, and the project will be finished next year,
and that's actually the topics I'm going to talk about. The aims, as I told you, is to create as much as possible full-featured user-friendly widget library,
which will contain the main part of relevant Czech mathematical literature. When I say Czech literature, this is of course nonsense, and we mean all the publications which were published on the area of today's Czech Republic or former Czechoslovak Republic, or former, say maybe, part of the Austro-Hungarian Empire.
What do you want? That means in this region. This doesn't mean really the literature in Czech language. Of course to create an archive and source of information for all possible stakeholders, and our estimate at the beginning was, and it was quite realistic,
is that this will cover maybe some 200,000 pages by the end of the project, and we have the big hope that maybe this will be a good contribution to the possible European DML or even the world DML.
So, as I told you, we are working on the project basis. That means there is a limited amount of money, a limited time for the project. There must be some goals which we have to finish by the end, and this has also of course some weakness.
The best way would be to have some institute, some department, which would be running for many years, limited, and to work on this task, but this is not our case. So, what are strengths of this approach? We have the funds. We are lucky that we could arrange a group of people who have the expertise
of all or at least most of the fields which are necessary for creating the DML. We have many young people who are very enthusiastic. This is very nice. And we have quite a lot of students who can be hired and work,
make some student papers or just to work for money. Weakness of this approach is, as I told you, the time and capacity limitations. The problem is that since we are working on the project basis,
most of us are employed on other tasks, and so we don't have time to work eight hours a day on this library, so sometimes we are really in difficulties because we depend each on others according to the professionality, to the orientation,
and if somebody has no time to work, then maybe the others have to wait for it. That's the weakness. Opportunities are that we have the support of the Czech mathematical society and of all publishers of the Czech mathematical journals.
This is very important. If I say support, that doesn't mean financial support, but moral and technical support, which is also very important. What are the threats of this approach? That's the future, because before the project will be finished, we have to set up everything so that it will work further,
and the funds will finish. So the sustainability of the library is very important. As I told you, the inspiration we took from several other initiatives. The main one was from our French colleagues, the projects Numdam and Sedam, which was mentioned now by Pierre Busch.
Then, of course, the German digitization activity in Göttingen. They have done a lot of work, and they helped us to find good equipment for scanning. Of course, we follow the recommendations of the Committee for Electronic Information and Communication,
and some others, what we could find around. The team is formed by five groups, in fact. This is the map of the Czech Republic.
The project is coordinated by the Institute of Mathematics of the Academy of Sciences. I'm myself a mathematician. I'm not a specialist in computer science, not a librarian, so I depend totally on my colleagues. But I'm trying to coordinate and to look after the project
from the point of view of the stakeholder, of the mathematician who will use the library. Then there is the Institute of Computer Science in Maastricht University, the Faculty of Informatics from the same university in Brno. You see there's some distance of 200 kilometers from Prague to Brno. There is a good internet, excellent, I would say, internet connection so that we can work all together online.
Faculty of Mathematics and Physics of Charles University, and last but not least, the library of the Academy of Sciences, which is equipped with the scanning facilities. They have a very good digitization center built after the floods,
which were in 2002 in the Czech Republic, when quite a lot of scientific literature was destroyed, frozen, and afterwards, right, and then we had to scan it because it's not anymore possible to work with this.
Of course, when you start to create a digit library, the question at the beginning must be, what will be the content? So we started to discuss, not only in the project, but mainly with the mathematical community in the Czech Republic, what should be included in the library.
So we decided, of course, that first of all, there should be all relevant or important journals. When I say all important, the first step, these are the scientific research journals, which are internationally recognized. Then maybe in the second row, some other journals of local meaning.
But our interest, our aim is also to include in the future library also journals, which are important not only for researchers, but also for teachers. We have several journals published for high school teachers,
so we would like to include also such journals, and then, of course, maybe some societal built-ins and so on, et cetera, et cetera. But at the beginning, we started with research journals. We decided that we would like to include also some proceedings,
some proceeding series. Of course, this might be very, very wide range, but we decided to include just some very important established conferences. We would like to include also some monographs,
and maybe in the future some others as well. There is no power, unfortunately.
Why we started with the journals? Because there is a simple acquisition. Journals are mostly contained in libraries, and we can get all of them.
The structure is relatively clear and unified. Mostly, they are printed in a good quality. Of course, we don't go too far to the history. The journals are covered quite well in databases, like Central about the reviews.
Of course, there are some differences, like different types of things. The old volumes of journals are printed in classical way. Afterwards, beginning in 90s, most of the journals were typeset in tech, electronically.
But everything is still quite simple. The specialty of our journals is that they are very, very multilingual. Our journals cover, in general, seven or eight languages.
We decided to include in the library altogether 14 titles, as I told you, research, education, and popularization. So let me go a bit further through.
The different inputs we can get for journals are, of course, in the beginning, scanning the old volumes. That's what we started. We could get also some files from Göttingen Digitization Center, where they digitized some of the journals,
but they didn't create such a type of library as we did. So we could use already the bitmaps obtained from Göttingen for some volumes. Then, another phase is the journals which were already printed via computers, but we had to use postcards or PDFs,
because it was not reasonable to reconstruct the original tech files. And then, the brand new journals, which are published now already in the digital way, we can use the system similar to what Thierry Busch mentioned right now.
We hope to use this way to include semi-automatic inputs to the library. Conference proceedings, there is no need to speak much about it. This is very similar to journals, only a little more complicated.
Problem is with the monographs, especially when you go to very old ones, as we did. Each one is an authentic original, and so this is a very complicated discussion about how to create and how to organize the metadata.
There is, of course, much stronger emphasis on the IPR issues. That's why we go so far back to the old prints, etc. The core concerning monographs in our library contains about 20 books written by the well-known Czech mathematician Bernard Bolzano,
which is, I think, of some good value. With concerns of workflow, I do not spend time with this, because this will be mentioned by my colleague Bartoszek in some lecture later on, so let us skip this. When we start on the pixel level, that means scanning or starting with these bitmaps,
we decided that our standard, following to the Committee for Electronic Publishing recommendations, our standard will be 600 dpi. We scan on 4-bit depth, because this enables geometrical and other transformation of images
using the bookstore software. The material which we have taken over from getting N was sometimes done on 400 dpi and bit-on-null, which required some slightly different approaches, but this was still possible.
But we made several tests, and we found out that really there is a difference between 600 and 400, and so we recommend to scan not below 600 dpi anyway, and maybe it depends on the facilities in the 4-bit depth.
OCR, Leo's fine reader... You are very strict, but I started too late. No, I already... I will ask whether there are questions, and when there are not, you may follow it.
Are there any questions at this point? The same question as before, what's the IPR on the results of your stuff? Yes, this is one. Let me go to this one. The IPR is a complicated question, and of course there are two ranges of IPR questions.
That means the original work, and then the electronic stuff. Yes, what concerns the original work? Then the owner can be the author, the publisher or distributor or administrator of the digital library. It depends what kind of stuff it is.
In general, we have the following scheme, because here this is important. In the Czech Copyright Act, it says that the newly created digital copy of existing work is a new original. So we have to ask again the author the permission for it.
This is very crazy. Of course, we don't care too much, because the fines or the threats are rather theoretical. So what is our scheme concerning journalism proceedings? Usually the publisher has the rights, and it's his duty to have the rights from the author that he can publish it.
So we negotiate with the publisher. We don't care if the publisher really has the rights to give this right to us, but they mostly do it, because nobody cares. How to say it otherwise?
What concerns Monograph? This is more complicated. If there are existing rights, we have to get written permission from the author. If the rights elapsed, then of course there is just the question of who owns the digital copy, and that's the library, or let's say the Institute of Mathematics.
How are you going to license your results, if you have the rights? Are you going to license them to other people? Why are you going to keep them? Our aim is to have open access as much as possible.
Are you going to license this to their own OCR or to their own semantic recognition for search or something like that? We have just a very simple announcement at the beginning of each paper about the copyright.
Who is the owner of the digital staff? Who is the owner of the copyright of the original work? And they have to ask us if they want to copy and so on to work with any part of this. And if they want to make a copy of this, which is displayed, then they should attach the first page with this copyright in terms of use.
That's our policy. You should probably consider using... Like what? There is a licensing scheme called Creative Commons that was developed for situations like this.
Very well made, it has all kinds of national versions. I'm pretty sure it has the Czech version, I know it has the German version, it has the US version. And they really know what they are doing. So there is a bunch of the premier IPR lawyers of the world who have developed this.
These licenses are better than anything people like us who care about other things can come up with. So you should probably just look into this and license it with probably Creative Commons. And it should do everything you want.
And it's very likely to be tested in court and all of those kind of things. And you don't have to worry about licensing. I can tell you about this. A publisher has to agree to that. Of course, but most publishers don't really know this. And if you stress them enough, they will agree.
We know this, in my case. Okay, any other questions? Just close it. These are our conclusions. It's not necessary to repeat it. Actually, it came out from the beginning. I think it might be worth mentioning that the stuff that was already done is online on dml.cz.
So you may browse the digitized articles there. And more about the project, you may know at the project.dml.cz. How many pages do you have?
How many? How much? Right now. Right now, I think it's almost 90,000 pages. Which are displayed, yes? Of course, there is a lot of work prepared. To be finished next year. Okay, that's it.
Coffee is prepared.