AV-Portal 3.23.2 (82e6d442014116effb30fa56eb6dcabdede8ee7f)

Status of the Apache ODF Toolkit (incubating)

Video in TIB AV-Portal: Status of the Apache ODF Toolkit (incubating)

Formal Metadata

Status of the Apache ODF Toolkit (incubating)
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Prototype Moving average Queue (abstract data type)
Web page Axiom of choice Standard deviation Group action Server (computing) Open source Computer file Java applet Code Multiplication sign File format Web browser Protein 2 (number) Front and back ends Web 2.0 Revision control Goodness of fit Roundness (object) Well-formed formula Different (Kate Ryan album) Cuboid Office suite Standard deviation Validity (statistics) File format Server (computing) Computer file Projective plane Evolute Cartesian coordinate system Data mining Software Website Right angle
Principal ideal Java applet Source code Web browser Mereology Element (mathematics) Architecture Sign (mathematics) Semiconductor memory Encryption Representation (politics) Mathematical optimization Social class Form (programming) Scripting language Module (mathematics) Electric generator File format Software developer Projective plane Binary code Content (media) Audio file format Bit Software Text editor Right angle Object (grammar)
Axiom of choice Complex (psychology) Presentation of a group Group action Java applet Graph (mathematics) Multiplication sign Source code File format Mereology Perspective (visual) Usability Formal language Subset Data model Facebook Computer cluster Different (Kate Ryan album) Semiconductor memory Oval Automation Species Office suite Endliche Modelltheorie Factorization Social class Scalable Coherent Interface Area Scripting language Algorithm Mapping File format Computer file Keyboard shortcut Electronic mailing list Special unitary group Bit Mereology Flow separation Sequence Formal language Parsing Right angle Point (geometry) Implementation Computer file Open source Transformation (genetics) Real number Template (C++) Element (mathematics) Attribute grammar Architecture Latent heat Well-formed formula Energy level Plug-in (computing) Mobile Web Focus (optics) Standard deviation Multiplication Graph (mathematics) Validity (statistics) Information Graph (mathematics) Code Usability Database Line (geometry) Grass (card game) Compiler Graph theory Query language Network topology Formal grammar Musical ensemble Table (information) Abstraction
Axiom of choice Web page Slide rule Inheritance (object-oriented programming) Table (information) Divisor Graph (mathematics) Connectivity (graph theory) Multiplication sign Collaborationism Control flow Branch (computer science) Black box Event horizon Code Attribute grammar Element (mathematics) Medical imaging Mathematics Prototype Natural number Energy level Cuboid Office suite Data structure Endliche Modelltheorie Graph (mathematics) Information Mapping Inheritance (object-oriented programming) Moment (mathematics) Code Bit Cartesian coordinate system Sequence Proof theory Database normalization Word Software Order (biology) Right angle Table (information) Limit of a function
so I thought of course of sex drugs and rock and roll but I'm not sure if I
could come down with this so it I could summarize it to one thing it's money so this is might be interesting you the P is for the prototype fund and it's about
funding open source projects let's start with you have to be German citizen that's the butts nice place to live you can choose your out yes you can be from any nation and I was able to to won the second round of protein funding which at that time still got only thirty thousand and now because and that's quite funny there's group of six girls or lady young ladies teacher I believe it or maybe but they are they are handling it that's what their idea and they are with the government of research there they're getting the money from them but they are in between there and this very just you fill in one formula with your idea open source how itself how the idea sells everyone and it's it goes very easily through and within the 50 thousand there's like two and a half for consulting for you for coaching but yes even thirty thousand was great I was planning to be here with a friend of mine but he started to do a phishing app and so I had to do it alone and that's so far as I wanted to go so take a look at it profanity it's it's it's worth to jump into it and the second thing is caught this is the good news it's not a bad news but ODF is about the
standard file format and to me I think open source is not enough open source sis different software have to be interval have to communicate with other otherwise there's still the chance of locking and all those standards are very slow the the ISO the pros of ice has been made for material for pages that never change and software needs evolution right there's a contrary contradiction here but you have to keep in mind that the ODF the blueprint of the audio application flight LibreOffice is very very important so but I heard often Oh standards I don't like them I like open source that's efficient no I don't think so and I'm pretty much sure that the user needs the freedom to choice freedom of choice to switch between ODF applications so with no further ado some history of the incubating project I believe it was 2005 I must admit I've just did a good guess when at Sun Microsystems we came together way at the starfish at that time maybe even OpenOffice but we thought about bringing all these software solutions that we made for the server these tiny things oh I just want to unzip the XML package with XML and add something and we we had most of us have something in Java and we put it all together in one place and because it was opportunities time we we made an open source as well and IBM at the same code fragments so we came together and it was then later being used and pushed by being is at the back end of a web server web office sorry web office right you have a browser and HTML the doc document that you're editing and it basically it's in been ODF in there and the server it's being sent back and unzipped and transformed to HTML back and forth oops so how do you know anything about the toolkit you might have used it before without knowing recently it's been Libre Office bonds with the validator the website for the validator and where you can we have a front-end as a JSP it's running out of the box you just get a wall file from the project and you can use it as a standalone version as well and this is
basically the main modules of the other project at the top of the generator which is for me the most interesting part because it's generating the source code from the schema right and the basic idea whether one of the principal I learned from software development is the more you generate the better it is right otherwise you have to do the work over and over again or get a mistake it's it's it's horrible a lot of work and ODF done the Dom gives us in DC it gives sign about it's like in the browser HTML Dom every element has its own object the advantages that you will from the start on you have no informational so you load this full document into the Dom you can edit it adjust it and save it back the ideas I'm generating it is the schema is quite complicated give more details later so the more you generate the less the developer has to know about the schema and can be guided with with the let's say typed classes there's even an element class for the paragraph called text P element but the disadvantage is the memory is larger than let's say binaries or bits and an optimization but I think in the toolkit in the first place if the you need to have a research you have to improve the generation and the light-armed you can generate source code not yet like in Java maybe later and rust and of course maybe in in a binary representation and going away from the Dom aside of this using the Dom is the excel to Rana step on your excel T which is which enables you to run the XLT script directly on the ODF document without the need to unzip the content and the styles of song right so those guys who love Excel T will happily be using the scripts directly on the document the other thing we just saw before it's the audio file editor these are the both important works from the audio storm and the last thing that's been donated by IBM and no longer supported by IBM as soon they yeah as soon they chosen to choose to move on to something else they abandon that and then this I make it in red because I think as well it's yeah it's could be done better yes it was a very fast work they did once so let's take a closer look at the CEO deed of ODF done there's
a package layer which is taking care of a zip unzipping and the manifest and totally independent this can be used by any other software as well epub one by way use the same ODF 1.1 packaging form unfortunately they fought for no reason I am a well maybe they didn't want yeah they we don't talk to each other and they invented their own sign a encryption and have their own packet format for which is nonsense of course but where we didn't have dinner together so and the next thing on top of that is the as I said the generated layer and this general related general later of
Surak can be splitted again in two different areas one which is totally yes generated the implementation in a detail of the XML and the above we call here the document API is the way the user knows it like my mom would say there's a paragraph there's a table they don't she don't know anything about how it's been implemented in XML and the funny thing from the users perspective many office documents look the same like duck eggs ODT they if they loaded in the same office most documents look quite the same but XML is totally different so on the abstraction of the user level they're they're very much the same so this layer concepts can be found also in
the specification which consists in the only one or two specific agent three parts and you've seen the lower layer here the part three is the package format and which can be used by others as well as I said and the first one specifies the XML and the second one is just a formula for the colic might be used in writing as well but usually only for the colic so there's also this separation of concern or mobilization being given by these three specifications and what we have we talked earlier in the first part the XML is the schema the grammar that tells you what is allowed and this is quite complex I'm not sure if you can read it here but it's I would say it's it's actually from a usability perspective for me it's ugly because we have about 600 XML elements and one town of 1300 XML attributes and this is quite a lot some would say if you write an office with only of the paragraph it's also quite complicated than adding stars and so on but embracing everything is quite impossible without the way of generating and making this easier to to understand I've chosen the table here and you see there are a lot of references and so on I won't go into it into detail but we started this generator 2015 a set and in the beginning we we simple and first try worth XSLT a christian Lipka did some excellent former colleague did some excel t transformation and read this xml file directly to fill it into java which was yeah quite have work and to quite of things he did and only a subset of course and we couldn't use this oh it's be it's not a p binding yes we couldn't use the java xml binding because the sun standard for mapping xml to java classes only works for w3 schema and not for the relax ng schema well the nice thing about standards here so many you can choose one so now there's no interoperability as i said so instead and that's what we're currently using we were used to different open source technologies it's a multi schema validator from sun which takes part take care of the pausing you can read it have to don't invent it or write it yourself and there's internal model then and from this it's you've take this and fill it into templates text files where you can create anything we create a HTML documentation there's some Python i believe and yes mainly java then it's been tested and all the information was being that's that we wants to use was been sucked out of this model into lists and maps and somehow I realized that was quite difficult when I try to improve this I realized I couldn't find these things in the list and and it was very hard to do to him to expand it and I thought it would be much better if we could directly take the relax ng as a graph right because like every XML it's a tree basically yes but as soon you've got references like a style ID to a style you got cross references within your starting with the graph so and graphs as you might know with the success of the social networks like Facebook where graph theory comes in the daily work is the main focus the the work the research in this area has been normally expanded and the algorithm to to use graphs and all of them much much better so what I did and I reused as soon as well when it says okay I want to know the reaction G into the graph database which graph database do they use a nice thing is the tinker pop API I think of pop Apache is again hiding the implementation detail of a graph database that you can use every grass database and they have a language called gremlin a script language to traverse this graph which is then transformed to each of the graph database they're using and I feel pretty safe to go on an interrupted level and again here right so what I did now let me first I put this in the notes there I've stolen this from the cows Computer Club presentation where did they did source code analysts with graph databases and so I thought when they can do it with a source code which is much more complex I can do it with the real action GSK as well because with reaction G if I ask anyone here and as please tell me what is the minimal document that is possible in odf right simply go to the root and take all the mandatory elements and put them together you will not know basically but this is an easy query for a graph database give me now start here and now give me the minimal document that is that's being used here so I thought I need to reverse-engineer the reaction G or have a better tooling to understand it and to control it and that's what's the reason why I came up with it so I started with the multi-schema instead of reading the reaction see myself I go as well on top of this and I simply dump this memory model into a text file line by line and then write just for farm these aren't Allegro ma tu tu tu as part of you generate a parser you read it and you map it to with graph ml which is just simply a graph format which is quite interval and with this I could I could visualize first time a graph so now the only question at this point because maybe this is quite I'm speeding up a little bit on that because this is a essential idea why I'm doing this because relaxing G is so big and it's one huge text file and we want to improve it and want to work on it and like I like Stefan music clang compiler plugins to traverse to C++ source code which is very huge I want to use a graph database to traverse this tree of relax ng - too far - answer me questions in an automated way right and be able to do refactoring slater because otherwise it's too huge to for manual editing ok this is just we need to need a better tooling to embrace this complexity alright so what I did is please graph database give me from table table all the child elements and everything in between all notes in between and they're not in between like choice sequence and so on so you will don't you can see just a picture like a star picture you don't
see the details right this is the table table and all the elements around I have an ax give you Gail feet viewer they're just and the red things are the are the attributes right so do you you see there some structure okay I would zoom a little bit in yes the attributes and then we've got this year and I will explain a little bit there's a sequence ok a sequence of 1 2 let's mean there's an order you have to first you have to use this and then if you use is at the top there's an element comes text soft page break and after this you can use the table table role okay this year is boiled right at the moment right and and this year epsilon means nothing so me yes the choice do you have nothing of this in other words it's it's just meaning it's optional okay so the next step and that's what I'm currently working on is I'm I'm refactoring it and improving it by exchanging this to optional and whenever the this name is similar to this I remove this as well just to to simplify it okay I've got that famous left and going on so what I'm trying to do now is there are few things like choice and sequence that I need to generate that's not yet in the coding and also when there when there's a parent like a style and that has many start styles that have many style which have an ID I won't have a map in there it just want to generate it out of the box I want to generate as much from this Dom layer as possible then because I don't want to roll it over and over again and another thing is when there's a reference in HTML said oh there's a reference and there's a there's a you start of events and there's a stop of friends but they don't say that style ID in style name or style it that they are connected always connected that's missing information so and the next thing is I want to annotate and enhance the the schema with a distal information so I can generate more and the last thing is and there's the most important thing why I'm doing this all of this is their user changes with a not specified in the schema the schema says oh you can put anything as long as well it's fine but the users among us are just doing the same thing in our offices we are adding tables adding paragraphs adding characters and this is the hide document earlier see the high API the user API where I need to implement it for collaboration because if we collaborate it we the longer the single document that's the only way we have it it's no longer possible it's it's broken we cannot merge if you go if I give your documents give it back I cannot merge it it's like we need changes like in a get software commit I want to ask you what have you changed give me your changes right so I want to have user changes on the hi whole thing and I want to be able to answer this question so my work on the prototype thing was wait a minute I forgot the slide so this is just them they're the user changes isn't implicit send it but it's not being document anyway it's in our mind but it's not written and we have to start to write it down and have these injured delete and modify changes for all these user components we have to annotate in these schema so on my work on this by promet I'll prototype fund is that a promise to put an LD t into the olive toolkit and use it as a black box and it's been transformed to this sequence of changes like a cook recipe where you can say Oh enjoyed the first paragraph is that HelloWorld do this in the second second to an image third to a table right it's the high level changes it's totally equivalent and the other thing is it should be able to accept new changes and merge it into it right - to have a proof of concept of this and to see how its work and the new thing is here I want to generate as much as possible to avoid redundancy so then user changes the factor not as standard yet right so we are in need of enhanced the relax ng to generate it right and otherwise there's because it's too domestic why should I write it per hand if it's for all applications the same thing it's much better to have a to have an way to annotate it and how we do it that's easy but I'm unfortunately running out of time ok any questions thank you first okay yes please yes good question so the sequence by the way is just if you and I working on the same document we again have branches and we again in a graph right like in the git model but the graph is because it's the nature recipe