Why ODF is a better standard than OOXML
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56881 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2022132 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
DiagramEngineering drawing
00:37
Standard deviationSoftware developerMathematical optimizationService (economics)System programmingProduct (business)Process (computing)ConsistencySource codeMultiplication signInformationTextsystemStandard deviationOffice suiteOpen setConsistencyConfidence intervalForm (programming)BlogPower (physics)CASE <Informatik>XMLComputer animation
02:39
Standard deviationSurfaceContent (media)SoftwareDigitizingStandard deviationSurfaceContent (media)TextsystemSoftwareUniform resource locatorCodeCharacteristic polynomialArithmetic meanTouchscreenElectric generatorComputer animation
04:00
Product (business)Standard deviationFreewareVisual systemDigital signalRevision controlMathematicsStandard deviationContent (media)Open setRevision controlBitComputer fileEvoluteBeat (acoustics)Web 2.0MereologyArchaeological field surveyFocus (optics)Group actionAnalogyForm (programming)SoftwareMoment (mathematics)Computer animation
05:59
SoftwareTerm (mathematics)Normal (geometry)Standard deviationOperations researchSystem programmingMechanism designPoint cloudComputing platformMobile WebIntegrated development environmentProduct (business)Link (knot theory)Dublin CoreTextsystemSoftwareOpen setPhysical systemTouchscreenModal logicMultiplication signPhysical lawMetropolitan area networkAdditionFormal languageCharacteristic polynomialBitUsabilityStrategy gameFile viewerData managementDampingObservational studyTerm (mathematics)Different (Kate Ryan album)Complex (psychology)WritingIntegrated development environmentComputing platformGraph coloringInformationOffice suiteRevision controlAxiom of choiceImpulse responsePower (physics)Software developerStandard deviationPropagatorCodeDublin CoreRun-time systemNeuroinformatikContent (media)Computer animationDiagram
12:43
ComputerOffice suiteWordPowerPointMenu (computing)Computer fileStandard deviationBridging (networking)Statement (computer science)Element (mathematics)Attribute grammarMarkup languageWeb pageInterior (topology)Sample (statistics)Sound effectOffice <Programm>Kolmogorov complexitySource codeNeuroinformatikNumberGraph coloringCharge carrierEndliche ModelltheorieOffice <Programm>Multiplication signTheory of relativityPower (physics)Markup languageRevision controlPoint (geometry)Complex (psychology)System callOffice suiteDirection (geometry)Group actionStandard deviationDemosceneCASE <Informatik>AdditionDifferent (Kate Ryan album)Statement (computer science)DecimalComputer fileConsistencyWordDefault (computer science)MalwareModule (mathematics)Computer animation
19:13
Computing platformExplosionPairwise comparisonInstant MessagingReverse engineeringContext awarenessEndliche ModelltheorieRepetitionOffice suiteTraffic reportingComputer fileNumberAdditionVulnerability (computing)Multiplication signPairwise comparisonComputer virusWordMalwareDiagram
20:57
Client (computing)Client (computing)
21:27
Line (geometry)Extension (kinesiology)NumberDifferent (Kate Ryan album)Coefficient of determinationMetropolitan area networkProjective planeComplex (psychology)WordGroup actionComputer clusterSign (mathematics)Open sourceDressing (medical)Descriptive statisticsCodeMultiplication signTheory of everythingComplete metric spaceData miningNetwork topologyLengthSoftware developerComputer fileMoment (mathematics)WaveInjektivitätContext awarenessLattice (order)Office suiteMathematicsAreaData structureTerm (mathematics)UsabilityOpen setForm (programming)Data compressionInstance (computer science)Content (media)Source codeEmailInformationRepresentation (politics)Touch typingDecision theoryComputer animationMeeting/Interview
29:27
Engineering drawing
30:05
Computer animation
Transcript: English(auto-generated)
00:39
Why Open Document Format is a better standard than Office OpenXML?
00:45
Microsoft Office proprietary document standard. I will try to explain it in in an easy way because the issue is not that easy to understand, especially if you are not a power user.
01:03
If we look at the definition of standard the definition provided by ISO, it makes it clear that the standard has to be defined in a way that
01:21
contributes to the fairness and market relevance of the standard itself and therefore promotes confidence in the use of the standard. This happened for Open Document Format, did not happen for Office OpenXML.
01:41
Advantages of standard are also clear. Standard ensures consistency of features and this goes to the advantage of users that gets the latest technology and
02:02
gets the latest features without reinventing the wheel every time. There are standards that cover many, many, many topics and they've been reused and implemented by Open Document Format while they've been ignored by Office OpenXML.
02:27
Again we are talking about standards quality, not about format quality, because it's the standard that protects the user. Unfortunately standards may be hidden to the users as it happens in the case of electricity.
02:47
If you look at these pictures, these are proprietary plugs but they hide a standard voltage that allow people to plug in their their device without
03:01
without any risk independently from the location by means of an adapter. Document Format standards are never visible to the user because the software shows only the surface of the content iceberg. Contents are complex.
03:21
Document contents are extremely complex, especially the latest generation of digital documents and the software shows on the screen only the the thinnest layer of these contents, but
03:41
underneath that there is an incredible amount of code to reproduce all the different characteristics of the document. If this is standard then compatibility and interoperability are also granted and easy.
04:01
The importance of standard was not discovered but was underlined with the standardization of the HTML format. If HTML was not free and open today
04:20
we would not be able to access the incredible resources of the web. And the fact that it was possible to protect and defend the evolution of HTML as a standard made it clear to the
04:43
to the industry that it was possible to have also open document standard. Changing the document paradigm, going from the analogic document that was focused on the printed version
05:02
and the value of retaining the visual aspect was one key asset of these these analogic documents and the document was created for others to read. When we move to the digital document and the digital document is the future
05:25
the digital document is focused on exchanging contents and the value is preserving contents over aspect. You can have a slightly different aspect, but it is what is important is that contents are preserved.
05:41
The documents have to be created for others to add value. By adding our own bits to a document we contribute to the value of that document. And therefore it is important to have a file format, which is totally interoperable
06:04
which is exactly the same independently from the software that is used to develop that format. And this is only possible with open document format because that is the only true document standards and it offers freedom of choice to users. ODF is simple, solid,
06:27
is robust and secure, is consistent across operating systems, is based on different existing ISO standards and is predictable and consistent in terms of behavior versus contents and formats. If you create an ODF document
06:46
with two different software, ODF is so consistent that the document will be quite similar even if created by two software with a completely different approach to
07:02
writing the format. And last but not least, ODF is human readable by normal user. This is because ODF, the philosophy behind ODF was to design a vendor neutral format starting from almost from scratch
07:25
and using existing standards when possible. Although this is meant for some software vendors to tweak the software more than if they continue to use their original
07:44
formats. But the advantage for interoperability was so important that they completely justified this effort. So if you have a real standard, you will have a
08:02
zero day when all documents will have to be converted to the standard. The so-called backward compatibility is not really possible with a standard. And in fact, Office OpenXML has completely different characteristics. It's artificially complex and as such is insecure.
08:28
It's not consistent across operating system and platform. You develop, you write, you think you're writing the same document, but the reality is that you're writing a different document that has, that on screen has the same appearance, but
08:46
creates additional issues to the PC for being managed. In addition, it completely ignores many ISO standards and it reinvents the wheel when this is absolutely not necessary. And it's also unpredictable and not consistent.
09:10
You create a document and then you create the same document with a different version of the same software. And that document is completely different. Again, you see it exactly the same on screen, but the reality is that the document is completely different
09:28
in terms of code. And this makes it difficult for software to manage it in a consistent way. And last but not least, Office OpenXML is not human readable, even by power user. This because the philosophy was completely different.
09:49
Office OpenXML was designed by Microsoft for the Microsoft environment, ignoring, completely ignoring the need of interoperate with other environments,
10:04
with other software environments. And no attention for interoperability with other software or with the vendor neutral standard. So the difference, the basic difference between the two formats is that while ODF was designed
10:30
in the early 2000s for the next 20 to 50 years to liberate users from the lock-in strategy built into yesterday and today's proprietary format
10:44
and foster interoperability, Office OpenXML was designed as a pseudo standard to propagate yesterday document issues and lock-in for the next 20 to 50 years. Of course,
11:01
it is easy to understand why this has been done. The value of the market is over 25 billion dollars. And of course, by doing this, Microsoft is protecting their large chunk of this 25 billion
11:21
dollar market. You see this in the reuse of existing standards, as you see for ODF, the use of every available standard. But we can add the Gregorian calendar, we can add the standard for colors, we can add the standards of language code. Office OpenXML implements only the
11:47
Dublin Core standard to be compatible with XML. But even in this area, it manages XML in a way to make, to transform into useless things the XML design advantages that are
12:09
the if properly developed and managed, XML provides documents of superior quality which are easy to understand, are neutral in terms of time, are simple to convert and easy to
12:27
create. By adding useless complexity, Office OpenXML kills basically all the advantage of
12:41
being based on XML. Let's make an example. This is a way of our brain sees red and computer calls it with a number, with a standard number. Let's now look at how ODF describes that color
13:02
in, as you can see, three different modules or three different applications, same consistent way of describing the color, format color and the color number. If you look at Microsoft Office, they will use a completely different syntax for each application, a syntax that in some cases has
13:27
no meaning, YA with PowerPoint, Y color, the full color tag in Excel and CLR in PowerPoint.
13:46
It's really difficult to understand if you don't see the hidden reason and why, for instance, for Excel, you have to add it F on the color. This creates additional issues, but it's
14:08
probably not enough to protect the market, so what Microsoft did is they ISO approved
14:22
an open standard which was approved as Office OpenXML Strict, but this has never been implemented properly by Microsoft Office. On the contrary,
14:42
14 years after the announcement of the standard and 12 years after, 13 years after the standard approval, the default is still the transitional, which is not an ISO standard.
15:02
Therefore, all the people that think that they're writing a standard document when they're using Microsoft Office, they are unfortunately creating a proprietary document. In addition, of course, there is the fact that Office OpenXML Strict support is not common to
15:27
all versions of Microsoft Office. For instance, there are many versions of Microsoft Office for Macintosh that are not supporting Strict. Different versions of Microsoft Office 365 that
15:40
are not supporting Strict, and although there was an official statement by Microsoft in 2007 telling that Strict would have been deployed since Office 2010, we are 10, 11, 12 years later
16:01
and we still see this transitional non-standard proprietary, fully proprietary format being implemented by Office. And some examples in addition to the previous ones, some examples of poor XML, based on the XML conventions, documents should be human legible
16:34
and clear, and the terseness of the markup is not important. And the original
16:45
version of Microsoft Office Transitional, which is still reflected in today's Microsoft Office Transitional, is exactly going in the opposite direction to make XML not human readable and to
17:03
make it terse, while this is not important. And just to make some examples, there are words which are used without vowels in some tags and with the vowels in other tags,
17:24
and naming conventions are confused and are not consistent, so this adds complexity,
17:40
and so at the end we can say that we have the simplicity of ODF, which is where the complexity is low or even non-existing or reduced to the minimum,
18:02
and XML files are always human readable, and to the opposite you have Office OpenXML, where the complexity is the highest possible one in relation to technology, and XML files
18:22
are not human readable. A byproduct of this hidden complexity is that Microsoft Office documents are the preferred malware carrier since basically their inception.
18:50
This is a research made by the German government in 2011, the research was produced by Simantec, and as you can see, all the formats are Microsoft Office formats.
19:12
In 2018, Kaspersky Lab made an announcement where 70% of all malware was transported by
19:25
Office documents, and in 2021 App River in their global threat report confirmed that the situation has not changed. Numbers between this research
19:42
are slightly different because each research is looking for something different, but if you look at this later one you see that XLS virus is the first and doc virus is the third, so we have a lot of malware carried by Excel file and Word files, and if we look at our comparison of
20:11
CVE common vulnerabilities and exposure over time, we see that the number of these for
20:21
LibreOffice which are related to the ODF format and the number for Microsoft Office which is related to the Office OpenXML format is significantly different all over the years, and
20:40
so as I said before, the Office OpenXML seems to be also insecure in addition to be an issue for interoperability. On the other end, what could you expect from a company
21:02
that in 2004 released a manual how to lock in your clients, and this company is Microsoft. Thank you for listening so far, and if you have additional questions or you want to ask something related to this topic, I'm happy to answer it.
21:26
I am available for questions. I don't see any on the talk. Let me check.
22:14
Yeah, I don't see it. Yeah, Matias Valvakens, sorry for how you pronounce your name, says
22:40
ironically the ISO ballot inform only accepts the legacy doc and not even docx when uploading
22:47
ballot comments, so the ISO is not even trying, and yes, unfortunately the situation is extremely bad in general terms. ISO has approved the ODF but has never considered ODF
23:12
for their internal use. At the moment, I'm aware of a decision by the US Patent Office
23:30
and the US Patent Office has decided to use XML as a way for submitting
23:47
patents description but has chosen Microsoft Office so docx as the source of XML which is completely illogical given that the XML produced by Word is not human readable and is
24:12
full of useless complexity. There is also a question about how safe is ODF while transporting documents over email. Of course, we can
24:35
encrypt and sign the ODF as Svante answer but I think to understand the differences between
24:45
the two formats and how easier is to screw up the XML of an Office OpenXML file, you should just produce the same file, just create the same file with LibreOffice
25:05
writer and with the Microsoft Office Word, save it, change the extension into zip and decompress the files and look at the content document at the content XML.
25:21
It's visually easy to understand how simple and consistent in structure is the XML of an ODF file and as such, any injection of code would be easy to detect and how complex and for
25:48
non-technical users, how unreasonably complex is the XML of an Office OpenXML file. Of course, if you look at the content with the eye of a developer,
26:09
the file follows some schemas. The problem is that XML should be human readable and that XML is completely human unreadable and the majority of users
26:27
are not technically skilled to understand an XML which is completely human unreadable. The difference is also, for instance, in terms of number of lines of XML.
26:47
I show a two-page document which is around 200 lines of XML if saved as ODF
27:02
and is anywhere between 1000 and 12 000 lines of XML if saved by Microsoft Office in Docx. The fact is that the length of the ODF XML has always been the same
27:26
plus or minus one or two percent of the number of lines so going down to 190 and going up to 210 or 220 but really a negligible difference while I can show
27:44
you over the years the XML of Docx file going from 1000 to 12 000 lines of code and it's rather easy to understand that this is not consistent and especially with the latest
28:08
technology injecting malware into such a mess it's absolutely easier and the time is
28:22
going down no it's sorry for the bad English the time is expiring so thank you for listening to my talk and welcome I welcome questions for a short while after the talk if you want to join
28:43
me into the the back room or if you have doubts please get in touch with me or with other representatives of the LibreOffice project we will be happy to provide as much information
29:01
as we can on the on the topic and thank you and see you for the next talk
29:22
in the LibreOffice technology dev room