Improved coverage analysis for LibreOffice's CI
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 287 | |
Author | ||
Contributors | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/56876 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2022136 / 287
2
4
6
8
12
17
21
23
31
35
37
41
44
45
46
47
50
62
65
66
67
68
71
73
81
84
85
86
90
92
94
100
102
105
111
114
115
116
117
118
121
122
124
127
131
133
135
137
139
140
141
142
145
149
150
156
164
165
167
169
170
171
172
174
176
178
180
183
184
189
190
192
194
198
205
206
207
208
210
218
220
224
225
229
230
232
235
236
238
239
240
242
243
244
245
246
249
250
253
260
262
264
267
273
274
277
282
283
287
00:00
DiagramEngineering drawing
00:29
PrototypeOpen sourceProjective planeOffice suiteMathematical analysisContinuous integrationComputing platformLecture/Conference
00:54
Open setPrototypeDisintegrationSoftware developerJava appletGraph (mathematics)Covering spaceWeb pageSystem programmingDefault (computer science)Demo (music)AverageSource codeCodeExplosionPhysical lawPlug-in (computing)Data modelBoom (sailing)Formal languageFunction (mathematics)Adaptive behaviorMathematical analysisInternet service providerVolumenvisualisierungServer (computing)CodePhysical systemJava appletTraffic reportingComputer fileText editorNumberPlug-in (computing)Interface (computing)Group actionCovering spaceLevel (video gaming)PrototypeWeb pageFile formatGeneric programmingoutputExtension (kinesiology)Different (Kate Ryan album)Lebesgue integrationComputing platformSource codeFluid staticsFraction (mathematics)Front and back endsProjective planeLine (geometry)Endliche ModelltheorieSoftware developerProcess (computing)Instance (computer science)Point (geometry)INTEGRALPlanningArithmetic meanElectronic mailing listDrill commandsInteractive televisionContinuous integrationOpen sourceUniform resource locatorCuboidProgramming languageBand matrixComputer scienceAddress spaceEmailScheduling (computing)Slide ruleRevision controlGraph coloringProbability density functionInformationMetreArmKnotBridging (networking)RootMereologyStudent's t-testQuicksortForcing (mathematics)Water vaporLatent heatWordOffice suiteDivisorRule of inferenceGoodness of fitSystem callRoutingRow (database)Normal (geometry)State of matterSummierbarkeitTrailComputer programmingSelectivity (electronic)Self-organization2 (number)Series (mathematics)Right angleStrutWritingMultiplication signWebsiteComputer animation
09:02
Open setTask (computing)Plug-in (computing)Software engineeringGroup actionPatch (Unix)Software developerAverageCodeScripting languageMetric systemRight angleProjective planeLine (geometry)Web pageSoftware testingHacker (term)Moment (mathematics)CodeSoftware developerFluid staticsPatch (Unix)Traffic reportingDirection (geometry)BriefträgerproblemJava appletMathematicsUltraviolet photoelectron spectroscopyBitMereologyChainGroup actionPhysical systemControl flowLevel (video gaming)Graph (mathematics)Equivalence relationAcoustic shadowDrop (liquid)Lattice (order)Translation (relic)Attribute grammarFile formatTrajectorySystem callHookingMatching (graph theory)Multiplication signCuboidClassical physicsEndliche ModelltheorieProcess (computing)Touch typingStudent's t-testComputing platformStructural loadQuicksortDivisorEqualiser (mathematics)NumberInheritance (object-oriented programming)PlanningLie groupStatement (computer science)Execution unitTask (computing)Network topologyPoint (geometry)Game controllerComputer animation
17:09
Gamma functionSoftware testingAverageComputer-generated imagerySource codePhysical lawResultantLevel (video gaming)2 (number)NeuroinformatikStructural loadMedical imagingBitSheaf (mathematics)CodeComputer animation
17:34
Line (geometry)Computer-generated imageryDifferenz <Mathematik>AverageLine (geometry)CodeCognitionComplex (psychology)Control flowLevel (video gaming)Software testingComputer fileNatural languageSoftware bugBitUnit testingVolumenvisualisierungDifferent (Kate Ryan album)Context awarenessOrthogonalityElectronic visual displayInformationCASE <Informatik>Query languageSearch engine (computing)Structural loadAreaPoint (geometry)Differenz <Mathematik>Multiplication signService (economics)System call40 (number)Physical lawRight angleSound effectNumberIndependence (probability theory)Task (computing)CausalityStudent's t-test
21:32
Right angleType theoryLecture/Conference
22:13
Descriptive statisticsSound effectNormal (geometry)Video gameCartesian coordinate systemRight angleFile formatComputer animationMeeting/Interview
23:03
Right angleLink (knot theory)Software testingMoment (mathematics)TouchscreenArithmetic progressionWikiBriefträgerproblemSlide ruleFile formatQuicksortBitSuite (music)Shared memoryLecture/ConferenceMeeting/Interview
24:26
Drill commandsArmProcess (computing)Plug-in (computing)MereologyGUI widgetComputer animation
24:58
Pairwise comparisonDistribution (mathematics)Slide ruleSoftware development kitType theoryComputer fileTraffic reportingRight angleFocus (optics)BitPlug-in (computing)Computer animation
25:33
Gamma functionAverageMenu (computing)Mixed realityACIDPlug-in (computing)Level (video gaming)QuicksortPlotterGame controllerComputer animation
25:59
Canonical ensembleWordCartesian coordinate systemBitSoftware testingSuite (music)Goodness of fitPoint (geometry)Level (video gaming)Lecture/ConferenceMeeting/Interview
26:33
Computer fileOperator (mathematics)MathematicsBitMedical imagingLevel (video gaming)Table (information)Differenz <Mathematik>Demo (music)Software testingSlide ruleSummierbarkeitLecture/Conference
27:40
Distribution (mathematics)Slide ruleTerm (mathematics)Flow separationBlock (periodic table)NeuroinformatikProcess (computing)Projective planeSource codeBriefträgerproblemOffice suiteInstance (computer science)Inheritance (object-oriented programming)WeightMeeting/InterviewComputer animation
28:57
Meeting/InterviewComputer animation
Transcript: English(auto-generated)
00:32
Hey everyone, welcome to this talk which is about improving a coverage analysis for LibreOffice
00:40
in the continuous integration platform that we're running. This is a joint project done by three people Linus, Svante and myself. This project is funded by the Prototype Fund, it's a German open source funding run by the
01:05
Open Knowledge Foundation and itself sponsored by the Federal Ministry for Education and Research and we're very grateful for that support. Great, so what is this all about? For LibreOffice
01:22
and our continuous integration platform we'd like to first of all develop some glue code to integrate different data providers so that we can tap into this rather rich ecosystem of tools. Once we've done that we will get lots of nice shiny new tools that we will
01:46
at least try to integrate some of them into the existing Jenkins instance and LibreOffice provided that it will be fine with that. Doing that would probably create incentives or would at least enable the project to
02:06
easily tweak things so that incentives can be generated for QA and developers to do the right thing. And last but not least we would love to provide an automated means
02:20
for especially newcomers to locate features in LibreOffice code. So this is a very frequent question. I'd like to figure out why some RTF thing is not working or I'd like to add a feature to a filter and where do I find the code and it'd be great to have something that provides that answer without human interaction because that's always a strain on
02:48
mental bandwidth there. Okay so that's us. There's also a contact page, project page for that.
03:03
If you look at the PDF version of the slides that you find in the FOSM schedule you also find the email addresses there for us. First of all, what is the very first problem number zero?
03:24
We have lots of nice tools for programming languages. Usually for something like LibreOffice we've got a number of programming languages that we would need tools for, like for example coverage analysis. So for example for C++ you got at least two tools for doing coverage analysis
03:48
and both of them generate at their backend generates data and the common denominator usually is providing some static HTML report pages. Same story for Python, same story for Java.
04:05
So you got programming languages and coverage tools and to add insult to injury then you got a number of CI systems like Jenkins, like GitHub actions, like Travis CI,
04:21
whatever box thing you got there. So if you only can use the intersection of your programming language and your coverage tool and your CI system you sometimes end up pretty empty-handed. So we'd like to solve that problem the computer science way which is just provide integration
04:46
APIs between those. Like have a shared API for coverage information and then provide glue code or adapters for programming languages analysis tools and also for CI systems.
05:04
Like C++ we got LCOV for the GCC, we got LLVM COV, we got GCOV that's parsing that, we got cover that is another tool that also parses that but
05:22
optionally is able to generate XML output etc etc. And then you have this on the CI side what seems to be lingua franca is cobaltura XML, there's lots of plugins there so that seems like a first good approach to convert everything to that. We're also planning
05:46
some rather abstract API for that like language server which solves exactly the same problem. So you have n editors and n programming languages and you want to have some syntax highlighting source code analysis and the way to make this feasible is to have an interface layer which is
06:07
the language server. So you only provide your syntax highlighter once and then all the editors using the language server suddenly can highlight your language so that's the plan here as well.
06:23
Okay so let's look at what Jenkins ecosystem that's what we're most interested in because LibreOffice is running Jenkins STCI tool already has to offer. Coverage so there's Jacoco which does something quite similar to
06:45
lcaf but it's nicely integrated in Jenkins so you get a report for for the build right there on the on the job page but kind of sucks because Jacoco is Java only. So next one up
07:01
I mentioned it that takes this XML file that is somehow produced by your build that can be Java that can be C++ and again kind of renders some nice static report out of that
07:20
which is kind of okay but again you would then for every tool for every analysis you'd like to you need to write a Jenkins plugin and then you need to write a GitHub action plugin etc etc etc. So much better is the code coverage API which is another plugin but it's like one level up
07:47
which is taking a number of input formats eating that and then producing reports for that so it's kind of tool agnostic it just takes rather generic coverage data and a number of formats
08:02
and then produces nice reports out of that which is much much closer to I think what we need so that would be the model that we would go for. The last drawback here is that this is again only for Jenkins so the last O step is missing here but it's like
08:23
getting two-thirds of the way. That's eating komatura, that's eating Jacoco and that's eating llvm-cov files and it has some nice extension points if you want to extend that with your own format. More features that API plugin also has some nice drill down lists so some lists of files
08:49
and the coverage there and I think this is really sexy some map that the size is like lines of code and the color is like what's the coverage percentage of lines covered so that
09:06
should give some really nice overview about code base and it can drill down where perhaps a bit more test coverage would be would be great. Okay so a few more tools that I came across this is mostly Java but there are sometimes
09:27
RAC++ equivalents. Spotbox could run clan tidy there probably or some other or CPP check
09:40
always nice would be if you have that not as a separate tool but right there in the Jenkins drop page so you have a patch on Garrett and it gets you a nice report on how this patch is looking rather than one week later you might or might not get some
10:03
warning from a tinderbox or you might actually be going to some CPP check page and see how the code is looking. Check style yeah I'm not sure clang format probably but we can also have clang format is a hard no right now so we run this as a pre-commit hook but possibly we could have
10:33
some hints there like things where there's no clear right or wrong but some some preference from the project and then you still get some something nice here. PMD is Java only
10:46
possibly we could run something like asyn there depending on the computational cost for that. CQD copy paste detector sounds very useful to have that so if there's no new code being
11:02
introduced the copies a lot that would be bad we could also run additional things like basically everything that people are in the community are running and have some script for that like spell checker on comments or other things we could run this in CI
11:22
and get this right there before things get merged so this down the road fixing we wouldn't have that anymore. Right so creating incentives we really should get more into the habit of
11:45
automating what is possible for reviews I think we started to do that and then we stopped and there's still a lot of fixing up behind after the fact after things got merged
12:01
and I think we can iteratively improve on that beyond this coverage thing that I started with which is also really nice for creating incentives like increasing the coverage or even finding the places that have weak coverage and then
12:22
having easy hacks to match people to work on that. There's also a nice way with incentives with nice metrics on the CI page is not to block
12:40
things from getting in especially when there's no clear no clear hard yes or no right or wrong answer but by nudging developers by suggesting changes so if there's a nice report that is nicely accessible right in front of your eyes but it's still a judgment call so you can still
13:03
override that but it's probably more often than not you would actually take that suggestion and act on that and then beyond that if there's a clear project preference for example
13:23
we had this for 4G build for a comment translation there were lots of German comments the moment you get metrics you create incentives for people to get those metrics into the right direction so that's also a nice way for the project leadership to create direction
13:47
and to create action by doing that and it's pretty easy to do that in a CI system. For coverage that's already there and those metrics are easy I mean those graphs are
14:06
just that's part of Jenkins the Jenkins ecosystem so it's not particularly hard to create that for any kind of metrics and then you can see like before the patch after the patch and usually you want to get certain things down and other things up and as long
14:24
as the trajectory is the right one you will get full marks on that and then you can encourage people by reporting on that like 20 patches or full marks by developer A and
14:42
then have some some nice report on the other weekly meeting. The bottom line really try much harder we should try much harder to put things early in the pipeline and to try things that get fixed up after the fact or earlier into the into the
15:04
development workflow and if you already have scripts for something like I don't know spell checking coverage cpp check other things asm ups and crash testing and
15:21
try at least if that is possible computationally possible to get this earlier in the into the contribution chain and then obviously encourage more automation. Okay that was that we will try to get as much as possible done with the project funding that we
15:45
have there's one killer feature that I'd love to see happening we will see how feasible that is and it's not it's not a thing that will just fall from the sky that needs like work
16:00
technical work to make happen and it also needs probably need some some work to create dedicated tests for that and that's a feature map also dot code cognitive so again the the underlying problem that we're trying to fix here is the the question
16:23
um where's feature xy like where's where's the feature that renders beveled borders or the the shadows or where's the filter code that imports um bolt um the bolt attribute and odf
16:40
um we we get already um most of the control flow for a feature at least the static control flow by the coverage um so you run a test one test not not all of them but just one test and you get a coverage and that's the code that's been touched by that
17:05
or triggered by by that specific test so you have a document that has something there and then you get the result when you have a second document that changes tiny little bit like just
17:25
add an image you do the same you load it you created the generator coverage and you have a sectioned coverage result now if you subtract the two you will find the code that is solely
17:40
used by by the the diff feature like the new thing that you added down so if you have a simple document and you make text bold and then you load that and then you div it from from the non-bold document you'll probably find the code that is somehow involved with making this
18:03
making this load and render and display the bold text that's pretty simplistic for something as complex as libreoffice it's um not clear how well that will work still having tests that
18:24
really tests in an orthogonal way test features like one by one um is something that we mostly like right now the testing um is very very complex things usually we have
18:41
a bug and then this bug gets fixed gets fixed and the bug fix gets tested and that usually mixes many many things in one document and it's not very orthogonal so i would think more than 90 percent of our documents are largely triggering the same code
19:01
and then there's very very subtle differences that we're actually testing with that but this rather orthogonal unit testing approach where we really try and trigger independent things in the code base we have very little of that so so this if that even works but you would create a nice incentive to have more of that kind of test in the code base
19:28
so um yeah um what you get then or what we hope that we can provide them is at least
19:41
some idea of a code cognitive map a um area of the code that we know what it does and we can keep that or generate that map more or less automatically by running tests so we would get
20:00
this documentation is nice but it frequently gets stale or it's not very um very very verbose it's not very um you need to see still for for something like librofus with 10 million lines of code you still need so much context information so much experience so much tricks
20:26
what to grab forward to look what to try where to put your break point to figure something out that getting something more automated that gets us a tiny little bit closer to being able to have a query or a search question asked in a natural language and
20:47
being pointed at at the code or at the file that does something with some probability that would be really quite nice especially for the mentoring that we're doing
21:01
and i've been doing quite a bit of that and it's it's very often this this question that you see five hours later in IRC and you were not there and you couldn't answer the question and the person who was asking has moved on because there was no answer and it would just be tremendously lovely to have a chance of pointing people at such a search engine
21:25
but we will see whether that has any chance of working at all in any case um we're at the end of the talk and i'd love to discuss with you what you think about those ideas um take them with a grain of salt and take them as suggestions
21:43
or as an offer um we'd love to at least do this tools improvement and adding more to the actual ci run um we love to do that um and we love to prototype that but just just let us know if that's a good idea or not how you like that okay
22:06
thanks a lot and i'm looking forward to the questions okay so it's possible that we're live
22:39
now with questions um the first one here was um the suggestion um that we could have those
22:48
feature descriptions cross application um which is possible um if it's file format based um so okay so possible for life now with the question right um so that would be limited to
23:13
anything like based on a file format um but um that's that's that's definitely something um
23:20
that swanter has started and been thinking about for for many years um maybe you'd like to comment on that a bit swanter okay um so so there's a link um in the uh i wonder if we can
23:42
somehow add that or maybe we can add that um to the to the slide material um which you find on the talk page i can update that after the talk with um the link to the oasis tc um and there's some um starting of a test suite um all right let me check in the chat so so maybe
24:08
um a while um there's more questions um coming up so sort of this um quote
24:26
let me try screen share um so so for example cpp check um that's been some long um long discussions that we had some some daily job running there um but if you do that in jenkins
24:41
it's actually pretty easy so um there's plugins for that you can um add it as part of the job you get some nice overview here some you can kind of drill down it runs pretty pretty fast so then you have this little widget here um and um you get this nicely populated you
25:04
can drill down here you can let's say focus on files or in types or in issues um and it's all pretty um right right in front of you um so um same for we we did this because
25:21
there was some easiest for odf toolkit um so we have a bit more um plugins here enabled for example the the coverage report that i was referring to in the slides um that is populating here and this very nice map um or it can drill down it looks pretty red
25:44
um but what the the upside is um that should create incentives then for people to increase coverage but what's want to um you you wanted to say something yes sorry i was just uh i had the audio with the two um two plugins that were uh talking to each other here from the two rooms
26:03
but michael figured it out so uh sorry for that i i heard you in the cane and uh canon uh so please repeat what you asked earlier because i couldn't understand a word you would they were um yeah back and forth sending the audio signals sorry for that so so maybe you can
26:22
expand a little bit on the um um this cross application um test suite that joss was referring okay okay that's um that's a good point um with the audio toolkit from the talk that i should have done yesterday um there are um we have this um i would call it let's say in a high level odf
26:44
api like um people um are using like the people that known that are not known to odf xml they know there's a table there's a paragraph there's an image in the in the document yes there's a delay here do you hear me okay i i continued talking just just too quickly that um that um
27:07
and we separated but by by adding these things and um yes and and we try to separate this and test files let's put this way and um by this operation or changes we have we can add and
27:23
distract at um yeah a subtract null features and um and the dc we we try to do the same thing for new changes um yeah it looks a bit scary here so i will try to make it short sorry for for the confusion okay so maybe maybe back to some some some demo thing so um what what i
27:44
also was referring to in the slides i was this copy paste detector um that's also like almost almost free like it needs setting up obviously also like it's not free in a sense it's has computational costs um but it's it's um it's a plug-in um so this this one is um again the
28:05
odf toolkit here but we can run this just the same um on on the office it just needs a source check out it can be a separate job um it can also be even um so perhaps that's um that's more
28:20
acceptable for the project it can be a separate jenkins instance so so that would still integrate with garrett so you would still get get some comment there and garrett um that would help reviewers to um to evaluate um possibly it would create the same incentives but it wouldn't block like this i'm waiting for the plus one so that you can merge last minute
28:44
merge your feature that it wouldn't block that um on the other hand um um yeah cpd and cpp check are pretty um lightweight um in terms of computational costs um and i think we're at the end of the uh session here let me check if there's any last minute questions no so thank
29:08
you very much and we'd love to continue talking about the usefulness and how to apply that um thank you all talk to you bye everyone enjoy first