We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Why ODF is a better standard than OOXML

00:00

Formal Metadata

Title
Why ODF is a better standard than OOXML
Title of Series
Number of Parts
287
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Open Document Format (ODF) and Office Open XML (OOXML) are both ISO/IEC open document standards, but while the first is loyal to standard definitions the second is an unfortunate attempt to propagate proprietary formats in a pseudo-standard format to lock-in users to Microsoft Office for the foreseeable future. The presentation explains why ODF is a better standard than OOXML.
DiagramEngineering drawing
Standard deviationSoftware developerMathematical optimizationService (economics)System programmingProduct (business)Process (computing)ConsistencySource codeMultiplication signInformationTextsystemStandard deviationOffice suiteOpen setConsistencyConfidence intervalForm (programming)BlogPower (physics)CASE <Informatik>XMLComputer animation
Standard deviationSurfaceContent (media)SoftwareDigitizingStandard deviationSurfaceContent (media)TextsystemSoftwareUniform resource locatorCodeCharacteristic polynomialArithmetic meanTouchscreenElectric generatorComputer animation
Product (business)Standard deviationFreewareVisual systemDigital signalRevision controlMathematicsStandard deviationContent (media)Open setRevision controlBitComputer fileEvoluteBeat (acoustics)Web 2.0MereologyArchaeological field surveyFocus (optics)Group actionAnalogyForm (programming)SoftwareMoment (mathematics)Computer animation
SoftwareTerm (mathematics)Normal (geometry)Standard deviationOperations researchSystem programmingMechanism designPoint cloudComputing platformMobile WebIntegrated development environmentProduct (business)Link (knot theory)Dublin CoreTextsystemSoftwareOpen setPhysical systemTouchscreenModal logicMultiplication signPhysical lawMetropolitan area networkAdditionFormal languageCharacteristic polynomialBitUsabilityStrategy gameFile viewerData managementDampingObservational studyTerm (mathematics)Different (Kate Ryan album)Complex (psychology)WritingIntegrated development environmentComputing platformGraph coloringInformationOffice suiteRevision controlAxiom of choiceImpulse responsePower (physics)Software developerStandard deviationPropagatorCodeDublin CoreRun-time systemNeuroinformatikContent (media)Computer animationDiagram
ComputerOffice suiteWordPowerPointMenu (computing)Computer fileStandard deviationBridging (networking)Statement (computer science)Element (mathematics)Attribute grammarMarkup languageWeb pageInterior (topology)Sample (statistics)Sound effectOffice <Programm>Kolmogorov complexitySource codeNeuroinformatikNumberGraph coloringCharge carrierEndliche ModelltheorieOffice <Programm>Multiplication signTheory of relativityPower (physics)Markup languageRevision controlPoint (geometry)Complex (psychology)System callOffice suiteDirection (geometry)Group actionStandard deviationDemosceneCASE <Informatik>AdditionDifferent (Kate Ryan album)Statement (computer science)DecimalComputer fileConsistencyWordDefault (computer science)MalwareModule (mathematics)Computer animation
Computing platformExplosionPairwise comparisonInstant MessagingReverse engineeringContext awarenessEndliche ModelltheorieRepetitionOffice suiteTraffic reportingComputer fileNumberAdditionVulnerability (computing)Multiplication signPairwise comparisonComputer virusWordMalwareDiagram
Client (computing)Client (computing)
Line (geometry)Extension (kinesiology)NumberDifferent (Kate Ryan album)Coefficient of determinationMetropolitan area networkProjective planeComplex (psychology)WordGroup actionComputer clusterSign (mathematics)Open sourceDressing (medical)Descriptive statisticsCodeMultiplication signTheory of everythingComplete metric spaceData miningNetwork topologyLengthSoftware developerComputer fileMoment (mathematics)WaveInjektivitätContext awarenessLattice (order)Office suiteMathematicsAreaData structureTerm (mathematics)UsabilityOpen setForm (programming)Data compressionInstance (computer science)Content (media)Source codeEmailInformationRepresentation (politics)Touch typingDecision theoryComputer animationMeeting/Interview
Engineering drawing
Computer animation
Transcript: English(auto-generated)
Why Open Document Format is a better standard than Office OpenXML?
Microsoft Office proprietary document standard. I will try to explain it in in an easy way because the issue is not that easy to understand, especially if you are not a power user.
If we look at the definition of standard the definition provided by ISO, it makes it clear that the standard has to be defined in a way that
contributes to the fairness and market relevance of the standard itself and therefore promotes confidence in the use of the standard. This happened for Open Document Format, did not happen for Office OpenXML.
Advantages of standard are also clear. Standard ensures consistency of features and this goes to the advantage of users that gets the latest technology and
gets the latest features without reinventing the wheel every time. There are standards that cover many, many, many topics and they've been reused and implemented by Open Document Format while they've been ignored by Office OpenXML.
Again we are talking about standards quality, not about format quality, because it's the standard that protects the user. Unfortunately standards may be hidden to the users as it happens in the case of electricity.
If you look at these pictures, these are proprietary plugs but they hide a standard voltage that allow people to plug in their their device without
without any risk independently from the location by means of an adapter. Document Format standards are never visible to the user because the software shows only the surface of the content iceberg. Contents are complex.
Document contents are extremely complex, especially the latest generation of digital documents and the software shows on the screen only the the thinnest layer of these contents, but
underneath that there is an incredible amount of code to reproduce all the different characteristics of the document. If this is standard then compatibility and interoperability are also granted and easy.
The importance of standard was not discovered but was underlined with the standardization of the HTML format. If HTML was not free and open today
we would not be able to access the incredible resources of the web. And the fact that it was possible to protect and defend the evolution of HTML as a standard made it clear to the
to the industry that it was possible to have also open document standard. Changing the document paradigm, going from the analogic document that was focused on the printed version
and the value of retaining the visual aspect was one key asset of these these analogic documents and the document was created for others to read. When we move to the digital document and the digital document is the future
the digital document is focused on exchanging contents and the value is preserving contents over aspect. You can have a slightly different aspect, but it is what is important is that contents are preserved.
The documents have to be created for others to add value. By adding our own bits to a document we contribute to the value of that document. And therefore it is important to have a file format, which is totally interoperable
which is exactly the same independently from the software that is used to develop that format. And this is only possible with open document format because that is the only true document standards and it offers freedom of choice to users. ODF is simple, solid,
is robust and secure, is consistent across operating systems, is based on different existing ISO standards and is predictable and consistent in terms of behavior versus contents and formats. If you create an ODF document
with two different software, ODF is so consistent that the document will be quite similar even if created by two software with a completely different approach to
writing the format. And last but not least, ODF is human readable by normal user. This is because ODF, the philosophy behind ODF was to design a vendor neutral format starting from almost from scratch
and using existing standards when possible. Although this is meant for some software vendors to tweak the software more than if they continue to use their original
formats. But the advantage for interoperability was so important that they completely justified this effort. So if you have a real standard, you will have a
zero day when all documents will have to be converted to the standard. The so-called backward compatibility is not really possible with a standard. And in fact, Office OpenXML has completely different characteristics. It's artificially complex and as such is insecure.
It's not consistent across operating system and platform. You develop, you write, you think you're writing the same document, but the reality is that you're writing a different document that has, that on screen has the same appearance, but
creates additional issues to the PC for being managed. In addition, it completely ignores many ISO standards and it reinvents the wheel when this is absolutely not necessary. And it's also unpredictable and not consistent.
You create a document and then you create the same document with a different version of the same software. And that document is completely different. Again, you see it exactly the same on screen, but the reality is that the document is completely different
in terms of code. And this makes it difficult for software to manage it in a consistent way. And last but not least, Office OpenXML is not human readable, even by power user. This because the philosophy was completely different.
Office OpenXML was designed by Microsoft for the Microsoft environment, ignoring, completely ignoring the need of interoperate with other environments,
with other software environments. And no attention for interoperability with other software or with the vendor neutral standard. So the difference, the basic difference between the two formats is that while ODF was designed
in the early 2000s for the next 20 to 50 years to liberate users from the lock-in strategy built into yesterday and today's proprietary format
and foster interoperability, Office OpenXML was designed as a pseudo standard to propagate yesterday document issues and lock-in for the next 20 to 50 years. Of course,
it is easy to understand why this has been done. The value of the market is over 25 billion dollars. And of course, by doing this, Microsoft is protecting their large chunk of this 25 billion
dollar market. You see this in the reuse of existing standards, as you see for ODF, the use of every available standard. But we can add the Gregorian calendar, we can add the standard for colors, we can add the standards of language code. Office OpenXML implements only the
Dublin Core standard to be compatible with XML. But even in this area, it manages XML in a way to make, to transform into useless things the XML design advantages that are
the if properly developed and managed, XML provides documents of superior quality which are easy to understand, are neutral in terms of time, are simple to convert and easy to
create. By adding useless complexity, Office OpenXML kills basically all the advantage of
being based on XML. Let's make an example. This is a way of our brain sees red and computer calls it with a number, with a standard number. Let's now look at how ODF describes that color
in, as you can see, three different modules or three different applications, same consistent way of describing the color, format color and the color number. If you look at Microsoft Office, they will use a completely different syntax for each application, a syntax that in some cases has
no meaning, YA with PowerPoint, Y color, the full color tag in Excel and CLR in PowerPoint.
It's really difficult to understand if you don't see the hidden reason and why, for instance, for Excel, you have to add it F on the color. This creates additional issues, but it's
probably not enough to protect the market, so what Microsoft did is they ISO approved
an open standard which was approved as Office OpenXML Strict, but this has never been implemented properly by Microsoft Office. On the contrary,
14 years after the announcement of the standard and 12 years after, 13 years after the standard approval, the default is still the transitional, which is not an ISO standard.
Therefore, all the people that think that they're writing a standard document when they're using Microsoft Office, they are unfortunately creating a proprietary document. In addition, of course, there is the fact that Office OpenXML Strict support is not common to
all versions of Microsoft Office. For instance, there are many versions of Microsoft Office for Macintosh that are not supporting Strict. Different versions of Microsoft Office 365 that
are not supporting Strict, and although there was an official statement by Microsoft in 2007 telling that Strict would have been deployed since Office 2010, we are 10, 11, 12 years later
and we still see this transitional non-standard proprietary, fully proprietary format being implemented by Office. And some examples in addition to the previous ones, some examples of poor XML, based on the XML conventions, documents should be human legible
and clear, and the terseness of the markup is not important. And the original
version of Microsoft Office Transitional, which is still reflected in today's Microsoft Office Transitional, is exactly going in the opposite direction to make XML not human readable and to
make it terse, while this is not important. And just to make some examples, there are words which are used without vowels in some tags and with the vowels in other tags,
and naming conventions are confused and are not consistent, so this adds complexity,
and so at the end we can say that we have the simplicity of ODF, which is where the complexity is low or even non-existing or reduced to the minimum,
and XML files are always human readable, and to the opposite you have Office OpenXML, where the complexity is the highest possible one in relation to technology, and XML files
are not human readable. A byproduct of this hidden complexity is that Microsoft Office documents are the preferred malware carrier since basically their inception.
This is a research made by the German government in 2011, the research was produced by Simantec, and as you can see, all the formats are Microsoft Office formats.
In 2018, Kaspersky Lab made an announcement where 70% of all malware was transported by
Office documents, and in 2021 App River in their global threat report confirmed that the situation has not changed. Numbers between this research
are slightly different because each research is looking for something different, but if you look at this later one you see that XLS virus is the first and doc virus is the third, so we have a lot of malware carried by Excel file and Word files, and if we look at our comparison of
CVE common vulnerabilities and exposure over time, we see that the number of these for
LibreOffice which are related to the ODF format and the number for Microsoft Office which is related to the Office OpenXML format is significantly different all over the years, and
so as I said before, the Office OpenXML seems to be also insecure in addition to be an issue for interoperability. On the other end, what could you expect from a company
that in 2004 released a manual how to lock in your clients, and this company is Microsoft. Thank you for listening so far, and if you have additional questions or you want to ask something related to this topic, I'm happy to answer it.
I am available for questions. I don't see any on the talk. Let me check.
Yeah, I don't see it. Yeah, Matias Valvakens, sorry for how you pronounce your name, says
ironically the ISO ballot inform only accepts the legacy doc and not even docx when uploading
ballot comments, so the ISO is not even trying, and yes, unfortunately the situation is extremely bad in general terms. ISO has approved the ODF but has never considered ODF
for their internal use. At the moment, I'm aware of a decision by the US Patent Office
and the US Patent Office has decided to use XML as a way for submitting
patents description but has chosen Microsoft Office so docx as the source of XML which is completely illogical given that the XML produced by Word is not human readable and is
full of useless complexity. There is also a question about how safe is ODF while transporting documents over email. Of course, we can
encrypt and sign the ODF as Svante answer but I think to understand the differences between
the two formats and how easier is to screw up the XML of an Office OpenXML file, you should just produce the same file, just create the same file with LibreOffice
writer and with the Microsoft Office Word, save it, change the extension into zip and decompress the files and look at the content document at the content XML.
It's visually easy to understand how simple and consistent in structure is the XML of an ODF file and as such, any injection of code would be easy to detect and how complex and for
non-technical users, how unreasonably complex is the XML of an Office OpenXML file. Of course, if you look at the content with the eye of a developer,
the file follows some schemas. The problem is that XML should be human readable and that XML is completely human unreadable and the majority of users
are not technically skilled to understand an XML which is completely human unreadable. The difference is also, for instance, in terms of number of lines of XML.
I show a two-page document which is around 200 lines of XML if saved as ODF
and is anywhere between 1000 and 12 000 lines of XML if saved by Microsoft Office in Docx. The fact is that the length of the ODF XML has always been the same
plus or minus one or two percent of the number of lines so going down to 190 and going up to 210 or 220 but really a negligible difference while I can show
you over the years the XML of Docx file going from 1000 to 12 000 lines of code and it's rather easy to understand that this is not consistent and especially with the latest
technology injecting malware into such a mess it's absolutely easier and the time is
going down no it's sorry for the bad English the time is expiring so thank you for listening to my talk and welcome I welcome questions for a short while after the talk if you want to join
me into the the back room or if you have doubts please get in touch with me or with other representatives of the LibreOffice project we will be happy to provide as much information
as we can on the on the topic and thank you and see you for the next talk
in the LibreOffice technology dev room