We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Writer Content Controls -- what happened in the past half year

00:00

Formal Metadata

Title
Writer Content Controls -- what happened in the past half year
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
LibreOffice was capable of handling form filling in Writer for a while already. In the meantime, the competition introduced Structured Document Tags, which is their default since 2007, and our fields and form shapes model them poorly. Writer Content Controls are meant to perform a great handling of this third type of form filling. Some recent developments also happened in this are in the past half year: PDF export, combo boxes, titles and aliases. Come and see how this now starts to work in Writer, what still needs doing and how you can help.
Software developerCollaborationismGoogolCodeContent (media)GUI widgetoutputSimilarity (geometry)Control theoryAliasingControl theoryContent (media)Complex (psychology)Point (geometry)Interface (computing)Markup languageTask (computing)MultiplicationCategory of beingType theoryWeb pageField (computer science)Invariant (mathematics)AliasingLatent heatFree variables and bound variablesInformationForm (programming)Limit (category theory)Uniform resource locatorTape driveRegulator geneConstraint (mathematics)Position operatorSingle-precision floating-point formatRandomizationUser interfaceInterior (topology)Context awarenessInclusion mapElement (mathematics)outputComputer animationLecture/Conference
GUI widgetProbability density functionContent (media)RectangleWordDefault (computer science)Exception handlingError messageBlock (periodic table)BuildingCategory of beingVolumenvisualisierungSoftware testingArtistic renderingScripting languageType theoryStructured programmingContent (media)CASE <Informatik>Frame problemScripting languageArrow of timeType theoryCategory of beingWordDrop (liquid)RectangleComplex (psychology)Software testingForm (programming)Electronic mailing listPosition operatorControl theoryMacro (computer science)CuboidSinc functionBitQuery languageFile formatDirection (geometry)ResultantProbability density functionData managementModal logicCompact spaceRandom accessDifferent (Kate Ryan album)FreewareoutputFree variables and bound variablesException handlingExtension (kinesiology)AliasingMereologyComputer fileLecture/Conference
Point cloudProgram flowchart
Transcript: English(auto-generated)
So this talk will be a follow-up to the one I did at the LibreOffice conference last year in September that was about content controls in writer in general and some of
the follow-up work was expected, some of that was more like a surprise, so a couple of incremental improvements appeared in the past half year, so it seemed like a good idea to overview where we are compared to where we were in September. So for those who don't know me,
I'm Miklos Shwainen, I'm from Hungary, I used to be very much involved in the writer RTF import-export, so much these days, but I still focus on writer, work for Collabora,
and for content controls, for the scope of this talk, we talk about this rich text content controls, it's like it's for fulfilling, we used to have these input fields in writer where you can provide some placeholder text and you can mark that this is the place of the
document where you can type when you fill in some form, but one big limitation of that was that it was built on fields and fields can't have formatting, so it was really just for plain text, and where writer really shines is more like there can be rich text, so we want something
that provides rich text, so that's where you can have rich text content controls. The UACSMA specification calls these structured documents tags, but it's really the same thing,
their user interface calls these content controls, so we also call them content controls. And the way it's structured is that you can, once you have paragraph text, then you can have multiple text portions inside that, so let's say you have some text, normal text, and then some
bold text, and then again normal text, then we split up the text, paragraph text, to three portions, the normal one, the bold one, and again the normal one, and for fields, the restriction was that you can't have multiple of these text portions inside, that's
to be filled in, and content controls support this, so you can have multiple portions, although they are limited to a single paragraph, at least these inline content controls that I'm talking about, so you can't have a content control starting at some random point in the document and ending at some random point, perhaps you know that you can do that with
bookmarks, it might start inside the table and outside the table, and field marks can provide the same thing, content controls are intentionally limited to be inside a single paragraph, so we enforce that when you create them, we enforce this when you edit them,
we enforce this during exporting to DocEx and ODT, so this is something, this is an invariant that we want to maintain. Another complexity is that it's possible to do nesting for this, so when you look at how we write this in XML, then XML elements naturally
support nesting, and we call this vamp-formed nesting, so the outer content control starts, and then the inner one starts, and then it's a requirement that the inner one will finish,
and then the outer one will finish, so you can do nesting, but you can do this start one, start two, finish the first and finish the last, similar to what you know from HTML, for example, so we want to support this setup that you can do nesting, but not in a random order,
and you can include multiple text portions, but not random positions with start and then bottom constraint that they are in a single paragraph, and if you have fields, then fields typically have some kind of instruction tags like commands, and there is the field result,
content controls are more like annotations and a piece of text, and so you have some start and end, and you can have a bunch of properties on top of that, we'll see, you can give it a title, you can give it an alias, you can define the type and so on, so the rich text is the
simplest type where you just say that you can fill in something here, and if the task is like provide your one-liner command from this presentation, then you say it was really bad, like really, really, so you select really and mark it bold, because it was really that bad,
but you can't do multiple graphs, you can't write a novel on how bad that was, so that was rich text, and so somehow the picture is missing the top pixels,
you missed the whole point, but I will explain what you should see that, so it's called, I think the values in the interface calls this title, but in the markup we call this alias, but it's the same thing, the point is that you have some complex form, and you are supposed to fill in the date, the date, and the date, and finally the date,
and of course they mean that, that means you are registering your company, and there was a date when you created the company, there was a date when you filed the papers for it, and there was a date when the first employee was hired, and so on, but it's just the date,
so it's very confusing to fill that in, and what content controls can provide is that when you enter that content control, then there is a small pop-up, similar to when you added headers, and you get the name of the page style, and so on, so you get some tool tape explaining what exactly you are filling in, that might be helpful, so let's say the text
would normally just say that you need to enter information about the, let's say the birth data, but that means that they want the birth location and the birth date, and then when you enter the content control, they can give you this hand, so that it's a bit easier, so the output, the field in form might conform to what was expected,
how some regulation is expected, but when you try to enter it as a moral, you are actually able to fill it in, because you don't need to like look up some pages of documentation,
how to fill in that form, you go to the form, and you get enough context information, so that you can just do that, so these aliases and tags were initially missing, on the right-hand side, and now we support that, then one other problem was that,
I mentioned you can have multiple text portions inside the single content control, so what you see on the above screenshot is that we have an X character, then a new line break, so we kind of hack it around, technically still a single program, but you see some multiple lines, and then this is even define some tab stop, and then a tab portion,
so technically still a single program, but you see that this is like at least three different text portions, and we used to take each and every text portion, and then it's PDF, which is for that, so in this case, when you export this to PDF,
and you wanted to fill in the PDF form, then you got three different widgets, which is a nonsense, this was not the intention, so in case, originally the placeholder text would be multiple portions there, and we still take the bounding rectangle of the content control,
and you just omit a single widget, as probably the user would expect that, then another thing is the primary use case we had in mind, was that you create some editable writer document, and you export at the very end, you export that to PDF,
and the actual form filling will probably happen in some PDF freedom, but you might also have some slightly different workflow, where you mark most of the document as read-only, and then you can have the editable document handed out to users, and they actually fill in the form in writer,
now the trouble is that in case we made the document read-only, then you can't fill in the form, because you change these content controls, and they are part of the document, and the whole document is read-only, so the content control is also read-only, now this was working with input files before, they had various problems, but this bit was working,
they knew that they are an exception from this general read-only thing, so it was possible to fill in input files, now we do the same, and we can have this setup that the whole document is read-only, but the content control can be still edited,
another thing was that if you look at what Word provides for VBA, if you want to manipulate these content controls, then they have an understanding of what is the list of content controls in the document, this can be very handy,
in case you want to have some macro that automatically processes the already filled in document, now there are other ways to do that, but one way is that you write some macro that will extract all the filled in results from the document, and for that they can just go to the first content control,
the second one, query how many content controls you have in the document and so on, but on the writer side this is really just a formatting on a program, so you would have to scan the entire document to find out if you have any content controls at all, so we don't have this random access to content controls, until we did not, so initially we ignored this VBA problem,
but when Justin was trying to build a VBA compact layer on top of content controls, he found that there is no random access to content controls, so you can't do this without scanning the whole document, which can be very slow, so this is not great,
and we discovered that actually footnotes already provide this, that's also kind of formatting on some piece of paragraph text, and that has this manager that will track as footnotes are created and deleted, and then you can quickly get a list of all the footnotes in the document, so why can't we do the same for content controls,
and yeah you can do that, so now there is some star basic access, or actually UNO API access, and then that's visible in basic, and also there is a VBA compact layer on top of that, where you can query how many content controls you have, you can,
if you fill in these alias things, then you can even say that I want to jump to the, birth date content control, without saying that is the third one, so if you insert something in between, then it won't break, so this manager provides that what's necessary here,
another thing was that, initially when I was adding drop downs, I wanted to incrementally extend what's available in rich text content controls, so the idea was that in case there are list items for this content control, that's then probably a drop down, but there is complexity there, you can have drop downs,
you can have drop downs, and you can have combo boxes, and it's possible that, you can't say which one it is, if it has list items, it might be any of that, and also it turns out it's valid to have a drop down with no items,
so that's what you see there, notice that's working, we explicitly track if that's a combo box or a drop down, and then you can have both types, we enforce that if it's possible to just choose one item from a complete list, or you can also have free form text there, and also in case some existing document for whatever reason has no list items,
then we don't break that, and we don't implicitly turn that to rich text content control, just because it has no list items. I think this is the last one, Hossain was doing lots of testing on content controls, and of course the first thing he was trying is some Persian text,
and of course it was breaking, I think it had three pieces, so one was the positioning, if you have the drop down arrow on the left hand side now for archaea text, that means that if you take the position of the whole bounding, then you need to shift that to the left to have the correct position,
so fixing up the position based on the direction of the text frame, so the top paragraph was one thing, then also what you render inside the bounding rectangle for the arrow button, and for that frame that needed fixing,
and the last thing is that if you see a button, then you might have the silly idea to click on the button, and you expect that something happens, but we need to do this heat testing to decide if you clicked on the button or not, and if you do that,
then we need to handle this correctly for RTL versus RTL, so that's now all working, so this is it, there was some polish, since the LibreOffice conference is still what the features that it provides, is still something that is meant to be one-to-one possible to map to the word feature set,
we tried to fully save this to ODF without any loss, you can export that to PDF, there are these various types, you can see a few types there, and basically more properties are no added, some small editing improvements, and it's a little bit easier to script that now,
so that's what we have. Thanks for listening.