Improving SmartArt import in LibreOffice Impress
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44343 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
SoftwareCASE <Informatik>NumberDescriptive statisticsImplementationGamma functionMereologyProcess (computing)ResultantFunction (mathematics)Video gameFunctional (mathematics)Game theoryMultiplication signContent (media)CollaborationismElectronic mailing listStudent's t-testMarkup languageCurveType theoryArtistic renderingRight angleRevision controlLevel (video gaming)Computer animation
03:32
Type theoryArtistic renderingVertical directionPoisson-KlammerClique-widthContent (media)Graph coloringComputer fontResultantCASE <Informatik>Artistic renderingPoint (geometry)Vertex (graph theory)Similarity (geometry)Electronic mailing listCuboidType theoryLine (geometry)Multiplication signCodeArithmetic meanGoodness of fitInteractive televisionShape (magazine)Axiom of choiceWordTable (information)Poisson-KlammerProcess (computing)Right angleComputer animation
06:12
Type theoryProcess (computing)Artistic renderingBlock (periodic table)Continuous functionProcess (computing)Group actionCalculationShape (magazine)Type theorySelf-organizationComplex (psychology)Markup languageConstraint (mathematics)Multiplication signExistenceFlow separationMultiplicationArithmetic progressionContent (media)Arithmetic meanAlgorithmNumberLine (geometry)1 (number)Right anglePoint (geometry)Data managementPersonal digital assistantBlack boxExecution unitSingle-precision floating-point formatCASE <Informatik>Analytic continuationComputer programmingImplementationBlock (periodic table)Computer animation
10:15
Shape (magazine)Constraint (mathematics)Rule of inferenceAlgorithmVertex (graph theory)Data modelNetwork topologyCodeMultiplication signContent (media)Constraint (mathematics)TouchscreenCodeDescriptive statisticsShape (magazine)MappingResultantForm (programming)Type theoryAlgorithmoutputNetwork topologyInheritance (object-oriented programming)VideoconferencingBuildingCondition numberCycle (graph theory)Moment (mathematics)Atomic numberCategory of beingFree variables and bound variablesBlock (periodic table)TunisRootAxiom of choiceSet (mathematics)Different (Kate Ryan album)BitView (database)Electronic mailing listPoint (geometry)Representation (politics)Rule of inferenceDirection (geometry)Semiconductor memoryImplementationAssociative propertyGamma functionMultiplicationGenderPosition operatorNumberSpacetime1 (number)Data structureProgramming languageUser interfaceData modelTheoremCASE <Informatik>Boolean satisfiability problemMarkup languageAttribute grammarOptimization problemSlide ruleElement (mathematics)HierarchyEndliche ModelltheorieComplex (psychology)Computer programmingMereologyLatent heatComputer fontMobile WebNormal (geometry)Computer animation
17:58
Shape (magazine)Rule of inferenceConstraint (mathematics)Vertex (graph theory)AlgorithmData modelNetwork topologyMarkup languageCodePointer (computer programming)DiagramoutputFunction (mathematics)Object (grammar)Texture mappingSoftware testingDisintegrationFunctional (mathematics)Markup languageMereologyMultiplication signOrder (biology)Macro (computer science)User interfaceComputer fileCodeShape (magazine)ResultantHierarchyContent (media)INTEGRALGroup actionInjektivitätNetwork topologyPosition operatorObject (grammar)MathematicsMatching (graph theory)Form (programming)Task (computing)Category of beingEndliche ModelltheorieMappingLevel (video gaming)NumberType theoryQuicksortRevision controlSoftware testingStructural loadElement (mathematics)Attribute grammarImage resolutionInstance (computer science)Streaming mediaDifferent (Kate Ryan album)State of matterNumbering schemeGraph coloringElectronic mailing listField (computer science)Rule of inferenceFunctional (mathematics)Single-precision floating-point formatGamma functionDiagramConstraint (mathematics)InferenceProduct (business)Vertex (graph theory)PlastikkarteAliasingSpacetimeLatent heatMoment (mathematics)ImplementationPoisson-KlammerCASE <Informatik>Computer animation
25:39
Source codeInformation technology consultingMarkup languageDeclarative programmingMultiplication signMappingShape (magazine)Structural loadNumberMoment (mathematics)Descriptive statisticsState of matterOcean currentMereologyRow (database)Direction (geometry)Pythagorean triplePlastikkarteComputer architecturePresentation of a groupWhiteboardCore dumpBlock (periodic table)Computer animation
28:16
Computer animation
Transcript: English(auto-generated)
00:06
So, this talk will be about improving SmartArt import in LibreOffice itself. I'm Miklos Wojnar of Collabora, and as you will see later, this is a collaboration between Collabora and SUSE.
00:23
I started my LibreOffice activity as a GSOC student, then later worked for SUSE for a while, and now at Collabora. Regarding SmartArt, the motivation for this work is that we already had basic support
00:43
for importing SmartArt, typically from PPTX files, but it can appear in XLSX or DocX as well. And the happy path as a newer PowerPoint typically writes not only the SmartArt,
01:01
which is, we will see later, a definition of what your content is and what requirements you have on that content, how to lay that content out, but also there is a fallback, which is a pre-rendered drawing of your SmartArt, and we can do a reasonably good job of importing that into Impress.
01:25
But when the fallback, drawing a map fallback is not there, then life is much more challenging. So, we had a number of cases where the output was basically either nothing or some letters rendered on top of each other, and that's basically it.
01:45
So, for some cases we had a terrible result, and this work is focusing on improving the rendering result, in case there is no drawing a map fallback. And it's still possible to show something sensible, it's just a matter of tracking down what part of the SmartArt description is not handled at import time
02:07
and how to fix that, what's the missing implementation piece there. You could say that this is not a real issue, all of the newer PowerPoint versions write this drawing a map fallback,
02:25
but the problem is that there is a large corpus of legacy documents which are affected by this no drawing a map fallback problem, and those if you only edit the document with newer PowerPoint versions, as long as you are not editing the SmartArt, this fallback won't be generated.
02:45
Also, I like to believe that importing the SmartArt to impress it just the first time, but it would be much more interesting later to actually allow editing of these SmartArts. And if editing is part of the game, then we definitely need to have this functionality
03:04
to take the SmartArt markup and do our own layout. So that's basically why I think it's a good thing that this is improved.
03:22
And the next step I would like to present a few SmartArt types which are now working much better. So the first thing will be the rendering of various list types. So for all of these examples, what you see on the left-hand side is the old LibreOffice rendering result,
03:43
and what you see on the right-hand side is the new rendering result in Impress. For the vertical box list, the old rendering result was basically not readable. The new one is close to what PowerPoint will do.
04:01
Then we had quite similar pre-canned SmartArt type in PowerPoint called vertical tab list. Again, the font color was not correct. Also, if you had enough content in the left column of the pictures,
04:23
or the shape, then that was not really readable. The new result is in many cases not perfect either. So the idea is that let's improve this up to the point that at least all the content is readable.
04:41
If there is some interaction between the various shapes of this SmartArt, then it's clear what is the meaning of that relation. And don't spend like a year on just one type, rather make much more types readable and good enough so that we can move on to the next type.
05:03
So this is not perfect, but the hope is that for many types, this is now much more readable and much more usable compared to what it was. So a prime example of this is this line list where the lines are still not perfect, but this was just not readable before. And now we see that this is a line list.
05:23
Or in the vertical bracket list case, simply the content of the actual amount on the right hand side is completely missing. Or just improper sizing of these child shapes made that if you had enough content that it was just not readable.
05:49
Then we have the vertical table list, which was again something that just the size of these child shapes was so incorrect that if you had enough content that much of the content was simply cut off.
06:03
So far about these various list types. Another bucket of various subtypes is the process types. And inside that, the first thing is the accent process.
06:20
So again, just due to the incorrect calculation of these text shapes, in the previous case the content was basically not readable and now it's readable. I would also note that next to the text content, also the bullets are no longer missing.
06:44
And in case you had some more content inside a single text shape, and you had multiple progress with bullets, then the existence of these bullets is very, very important because that's the only visible separator between the separate programs.
07:00
So just remove the bite and it's possible that the meaning of your content is just not something people can understand. So these bullets are very important. Then we have one more process type, this continuous block process,
07:22
where again the size of the shape was just so small that if you had enough content then the content was cut off. And the last type I was recently working on is this organization chart, which is pretty complex.
07:41
One metric to talk about the complexity of this chart is the number of XML lines that describe all the constraints for laying out. And the previous ones are typically around 300-400 lines. The organization chart markup is around 1200 lines, something like this.
08:03
So it's pretty complex. It just takes a lot of time to simplify these documents up to the point that they are simple enough, that they are still interesting and it demonstrates the problem. On the other hand, the size is manageable, so actually you can efficiently divide them.
08:23
So what you see on this picture is garbage on the left. As always, both on the right you see a strange organization with multiple managers, where some of the managers have no employees and this manager too has an assistant and also three different employees.
08:44
There are connector shapes between these two, so you have an idea how the organization is actually shaped. This one was very interesting because this shape is constructed by basically two algorithms.
09:08
One algorithm is focusing on how you lay out employees next to each other, so it's a vertical algorithm. And the other algorithm is focusing on how you lay out one unit as a manager,
09:24
optionally an assistant and the employees. So it's a vertical algorithm. And not much of this is documented in the Oaxama spec, so your best bet is that you play around with PowerPoint, you get an idea of what's possible in town. You simplify this huge markup to something sensible, one or two hundred lines,
09:46
you try to guess what's actually ignored during import, what's the reason, you just got this garbage and not the readable chart, and you try to implement the algorithm in a way that it mostly behaves compared to what you see with black box last time.
10:05
So far the results, and for the rest I would like to share some details about how this is implemented. So before any actual coding, I spent quite some time trying to understand just the concepts around SmartArt.
10:25
And I think this was the most challenging part, because inside the Oaxama specification there is a reasonable description of individual XML elements, XML attributes, attribute values, but that's like the very small details,
10:44
but there is no big picture over you in the spot. So, getting an idea of what are the concepts behind this layout is something that was the most challenging for me. So, first what we have is, we have some data for the SmartArt,
11:05
which is a hierarchical tree-like data structure, with data points having parents and children and siblings, and we have the layout description. And the layout description is again a tree of what we call layout nodes,
11:25
and this is what defines how the content will end up on the screen. That means that we have layout nodes for all the shapes which are visible on the screen, and also we have layout nodes for these containers. So, if you have three shapes next to each other and you want to render them on a linear path,
11:46
then you need a container that has an algorithm associated with it, and it should say that this should be laid out on a linear path. And then you need a layout node for this container as well. So, everything is a layout node in this layout description.
12:03
Then once you have the layout nodes, which are either shapes or containers, or placeholders for just spacing between shapes and so on, so you have this layout building block called layout nodes, then you can assign algorithms to these layout nodes,
12:22
where the simplest algorithm is the shape, which means that this layout node should be mapped to a drawing amount shape, and the shape should get some data, some content from the data definition. We will see that in a moment.
12:40
So, you have your layout nodes, then you associate algorithms to these layout nodes, and you organize your layout nodes in a tree, and to make this much more interesting, the layout tree can have nodes which are called atoms,
13:04
which allow dynamic behavior, so similar to accessibility, you can have conditions for cycles, choices and so on. So, it's almost a whole programming language, which is a bit scary. So, you can have just, let's say, two layout nodes,
13:26
two layout nodes in a layout tree, one is a root node and one child node, and if you have four layout atoms between the two, then according to your data model is dynamic, how many actual shapes end up in the document.
13:42
This is why you need some algorithm deciding how these multiple shapes are laid out. So, we had the building block, the layout node, we had the algorithm, we had the layout tree. So, data model mapping is the part that decides how the layout description
14:02
is associated with the content you assign to the SmartArt. This means that typically when you edit your SmartArt on the user interface, then you only edit the data, and all these layout descriptions are hardwired into PowerPoint, and you can create your own custom layout description,
14:24
but the major users will just use this pre-made 100 types, or I did not count them. So, there is a long list of pre-existing types, and users typically work with that. From our point of view, this is a very good news,
14:41
we have a fixed set of layout descriptions, and of course we tried to solve these layout differences in a generic way, but actually what users care about is really just this fixed set of layout descriptions, and if these are working, then users are typically happy.
15:00
This is a bit easier problem compared to just handling random input for this layout, and expecting that it's behaving exactly the same as PowerPoint all the time. Data mobile mapping decides how to associate the layout nodes to 0, 1 or multiple data nodes.
15:27
Finally, the last thing that really affects the shapes is that for layout nodes with shape algorithm, you can associate shape properties, and again this is tightly coupled to drawingAML,
15:43
so all the properties are using the drawingAML markup, and later we will see this is why we don't do a direct import of SmartArt into Impress, rather first we take the drawingAML input, we generate drawingAML in memory representation for that,
16:04
and we use the existing drawingAML import to actually map that to our shape model, because this way we can share a lot of code. We have constraints, so this is the most complex part of that, you could even use an SAT solver to find out what's the optimal value
16:26
for the various requirements to layout these shapes properly. The good news is that so far this was not necessary, the current implementation is a one-path layout, so we never position shapes multiple times,
16:41
and so far even this much simpler approach is giving a reasonable result. So constraints are the ones which decide all the properties of all the layout nodes, like what should be the spacing between the shapes, what should be their size, their position,
17:03
the font size of the content inside the shapes, and so on and so on. If you use this automatic text sizing in Impress for normal shapes, this is similar, it's just not a single something which does something automatically,
17:22
but a long list of properties and all of these are calculated automatically. Next to constraints there are rules, rules are used for dealing with situations when you have conflicting requirements, so in case for example you give constraints that it's clear what should be the size of the shape,
17:45
but then you have lots of content inside the shape, you have this conflicting requirement that on one hand you want to have all your content visible, on the other hand you have requirements for the size of the shape, so it should be let's say small, and then rules come into the picture,
18:02
and rules can decide what happens when you have the conflicting requirement. For example, the rule might say that if there is not enough space inside the shape, then you should give up the constraint about the height of the shape, so it will be a very tall one, but at least your content is readable. Or you can decide that the conflict resolution should be,
18:22
that the font size should be decreased so that it will be very hard to read, but at least all the content will be there. And the very last concept is the text properties, so next to the shape properties you can define all the aspects of text as content appearing inside these shape nodes.
18:47
Basically, these are the high-level concepts, and once you have a rough understanding of what these mean, then it makes sense to jump to the reference and actually read about what the individual XML elements, attribute, attribute values are doing.
19:03
So, regarding the actual markup, this is all part of the XML, with the same benefits and problems of the general XML documentation, sometimes something is reasonably documented, sometimes the documentation is completely opaque on the details.
19:21
Inside the PPTX file, we typically have either four or five XML streams, like XML files inside this zip package. We have the data for the shape, then we have, this is the only piece which is actually edited by the users,
19:41
using the PowerPoint user interface, typically. Then we have, we can have shape styles, which are specific to this SmartArt instance, this is defined in the QuickStyle XML, this is typically not edited by the user, we can have scholar schemes,
20:02
this is using almost the same markup as document themes, so this is again something that's typically not edited by the users, they just say that they want some dark color, let's say dark, and in practice, perhaps that will be dark blue, and then they want some text on that,
20:22
so it should be text color, and it will be white, so that it's actually readable on dark blue. Then there is the Laotex XML, so this is the container for the constraints, this is the one that you read the most when you try to improve this code,
20:43
and again, this is a fixed sound, so you choose that you want a vertical bracket list, and then you get a fixed layout, and whatever you do with your content, the layout definition is not changing. Optionally, there is this drawing stream,
21:01
which is a pre-calculated drawing among group shape, and if it's dark, then we can easily just import that into Impress. So much about the markup, speaking about how this is in the LibreOffice code base,
21:25
all this SmartArt handling is inside the drawing.yml import, and there is a subdirectory for diagrams, which is just the name for SmartArt, probably SmartArt is some product name, and diagram is the alias used in the specification,
21:41
but they are the same. At the moment, all of this layout is happening at PPTX import time, or OXML import time, so if you want to test how this behaves, then you need to open a file, and you will see the result, and you tweak the code,
22:01
and again, you re-import, so it's unlike writer layout, where you open a document, and you actually edit the document, and see how the layout reacts to your keystrokes. This is an import time, and as mentioned, this is a two-step approach,
22:22
so first, we parse the SmartArt markup, and we produce a tree of these drawing.yml shape objects, and then later, we give this shape tree, drawing.yml shape tree to the drawing.yml importer,
22:40
and that one, we issue or invoke the necessary unit, because the actual shape objects are created. So that's the high level overview of the actual implementation. Then, regarding how this is tested, the easiest way is integration task,
23:02
so you load the document into Impress, and then you can use the UNO API to find out the various properties of the shapes. Given that this is import time, all the layout thing is actually part of the document model once the layout finished, so you can just use the UNO API,
23:20
which would be available as macros, for macros as well, to assert the size, position, or content of the various shapes. Given that all the layout thing is done with these containers around the shape nodes, there can be quite a deep hierarchy of group shapes,
23:43
so the top container is always a group shape, but until you actually find the shape that has the text, it can happen that you have five or four group shapes inside each other, till you actually reach your content. But the benefit of this is that there is
24:01
more or less one-to-one mapping between the original layout tree and the resulting document model, and then you can reason about if this is a correct mapping or not. So this is somewhat helpful for debugging, and the users typically care about the resulting layout,
24:20
so they probably want a group, all the group shapes in four or five stages. I have no numbers what was the state when I started working on this, but today we have almost 30 tasks, like 30 loaded documents with different smart art setups
24:40
and loads of asserts. So when I start working on a new type, then add one test document for that type, hopefully that's complex enough, and then as I incrementally try to improve the layout so that it produces something sane, then I keep adding new tasks inside that single task function,
25:02
so there are loads of asserts compared to just the number of loaded documents. Hopefully that's a good trade-off between very long-running make check versus uncovered code. And the good news is that, again, this is future work, at least technically, so you can do all your changes
25:23
with the matching task coverage, so at least what was presented here as a result, all the improvements were in the form of doing behavior change and task change, no change without a matching task case. So, that's basically it,
25:41
thanks to our partner Susan, who sponsored part of this work, so this is why it was possible that I worked on this, and that mostly concludes my presentation. Thanks! Are there any questions?
26:04
Yes, in the front row? Yes, at the moment we read this, and if you write back to PPTX, then we actually remember the markup,
26:21
so it's not lost, but editing at the moment is not possible. Although, as you saw from the architecture, the mapping from the SmartArt to actual shapes is a separate block, so in the future, if we have lots of time and motivation,
26:40
it would be possible to move this from the PPTX import to actually impress core, and then in the long run this would also allow editing, but there are loads of problems to be solved first, but we are heading in that direction, yes. And another question in the back.
27:05
The current state of the export is that if you just import your SmartArt and you save back to PPTX, then all this SmartArt markup is remembered, but you can't edit your shape and still preserve your SmartArt declarative description
27:24
at the same time.
27:45
I don't have numbers on that and it's also an interesting question how you measure that. What I know is that I'm still aware of some problems which are interesting for SUSE, and we focus on our customers because that's how we get our paycheck,
28:00
so I still plan to continue working on this, and then let's see what will be the next step. Okay, thanks for listening again.