QA tools for FOSS distributions
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 84 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/40046 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2012 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201223 / 84
1
2
3
5
7
9
11
13
14
15
16
17
18
20
23
24
26
27
28
30
32
37
39
41
44
46
47
50
51
52
53
55
57
58
63
68
70
73
75
79
80
81
84
00:00
Connectivity (graph theory)Distribution (mathematics)Projective planePhysical systemComplex (psychology)Optical disc driveOpen sourceComponent-based software engineeringLecture/Conference
00:45
Distribution (mathematics)Suite (music)CompilerSocial classLatent heatDistribution (mathematics)Projective planeDifferent (Kate Ryan album)AlgorithmOpen sourceResultantSoftware developerPresentation of a groupXML
01:44
Focus (optics)Suite (music)CompilerSocial classFamilyWeb 2.0Social classDistribution (mathematics)Category of beingPoint (geometry)Revision controlMetadataMathematicsConnectivity (graph theory)WebsiteProjective planeNumberDirection (geometry)BitSubsetMereologyElectronic mailing listUtility softwareSoftwareDifferent (Kate Ryan album)Computer animation
03:53
Revision controlDisintegrationComplete metric spaceState of matterSoftwareObject-oriented programmingTrailMereologyProjective planeBitLibrary (computing)Distribution (mathematics)Revision controlConnectivity (graph theory)Suite (music)Disk read-and-write headArrow of timePhysical systemObservational studyConditional-access moduleDynamic random-access memoryDensity of statesComputer animation
05:04
outputFile formatGeneric programmingFunction (mathematics)ParsingDisintegrationBitFile archiveroutputDifferent (Kate Ryan album)Physical systemFile formatLogicPlug-in (computing)Distribution (mathematics)Entire functionSet (mathematics)Formal grammarSatellite2 (number)Connectivity (graph theory)Data compressionBoolean satisfiability problemInstallation artLatent heatLecture/ConferenceXMLComputer animation
06:14
outputFile formatGeneric programmingFunction (mathematics)ParsingDisintegrationRepository (publishing)Revision controlSource codeDifferent (Kate Ryan album)MetadataBinary codeFront and back endsCoprocessorFunction (mathematics)outputBitTotal S.A.InformationFile formatNumberElectronic mailing listMathematical analysisRepository (publishing)Software testingMoment (mathematics)Revision controlTraffic reportingDistribution (mathematics)File archiverComputer architectureCASE <Informatik>Alpha (investment)Virtual machineComputer animation
08:32
Function (mathematics)Computer wormLimit (category theory)Chemical equationInclusion mapWeb 2.0Revision controlChainCompilation albumNetwork topologyPattern languageXML
09:21
Virtual machineFunction (mathematics)Flip-flop (electronics)Execution unitDemo (music)Function (mathematics)Core dumpStructural loadInformationLine (geometry)Demo (music)Graph coloringConnectivity (graph theory)ResultantXML
10:10
Function (mathematics)Branch (computer science)FingerprintMathematicsResultantArithmetic progressionVirtual machineMoment (mathematics)Source codeJSONComputer animation
10:47
Dialect2 (number)BitPoint (geometry)TouchscreenLetterpress printingSource codeJSON
11:31
Computer fontDemonBuildingProper mapChainDirection (geometry)LogicWater vaporRevision controlUtility softwareXML
12:26
BuildingLatent heatRevision controlFunction (mathematics)File formatMultiplication signSource codeVirtual machineCartesian coordinate systemBuildingWrapper (data mining)Normal (geometry)Projective planeFunction (mathematics)BitXMLComputer animation
13:36
Function (mathematics)CAN busWeb pageRevision controlArchitectureSource codeSource codeRevision controlProjective planeXMLComputer animationLecture/Conference
14:26
Demo (music)Duality (mathematics)Source codeDemo (music)Source codeStability theoryComputer architectureComputer animationSource codeJSON
15:04
FingerprintSource codeDemonArithmetic progressionBitSource codeSource codeComputer animation
15:42
Execution unitComputer fontCartesian coordinate systemSubsetArithmetic meanBitSource codeXMLComputer animation
16:25
Revision controlMetadataRevision controlOrder (biology)Distribution (mathematics)Level (video gaming)XMLLecture/ConferenceUML
17:35
Revision controlRevision controlBitLecture/ConferenceXML
18:12
Revision controlRevision controlMetadataCASE <Informatik>Bus (computing)BitXMLComputer animation
19:00
Revision controlRepository (publishing)Revision controlMetadataBus (computing)Order (biology)Constraint (mathematics)ApproximationBitRepository (publishing)Figurate numberDataflowPoint (geometry)FlagDemo (music)Drop (liquid)
21:30
Demo (music)Pay televisionParsingCASE <Informatik>Multiplication signComputer animationSource codeJSONXMLLecture/Conference
22:20
Source codeGene clusterRevision controlDistribution (mathematics)Multiplication signBitNetwork topologyComputer animation
23:10
Computer fontFlagDistribution (mathematics)Software maintenanceXMLLecture/Conference
23:48
Computer fontNormal (geometry)Finitary relationDirected setGroup actionAlgebraic closureBitMathematicsSoftware bugWeb 2.0Distribution (mathematics)MeasurementPresentation of a groupData managementSet (mathematics)WebsiteXMLSource code
25:56
Normal (geometry)Directed setGroup actionAlgebraic closureAlgebraic closureComputer animation
26:29
Normal (geometry)Finitary relationDirected setGraph (mathematics)CASE <Informatik>Greatest elementCoefficient of determinationRevision controlOrder (biology)UMLProgram flowchart
27:19
Level (video gaming)Sinc functionCoefficient of determinationMultiplication signOrder (biology)Revision controlProgram flowchart
28:01
Set (mathematics)Menu (computing)Maxima and minimaTable (information)ZeitdilatationTable (information)Electronic mailing listSet (mathematics)Web 2.0Greatest elementWebsiteHuman migrationRevision controlXML
29:37
Computer fontData structureRankingAlgorithmSimilarity (geometry)Web 2.0DataflowBitGoogolGraph (mathematics)Distribution (mathematics)Mathematical analysisLecture/Conference
31:19
Computer fontRevision controlMeta elementPredictabilityApproximationRepository (publishing)BitRevision controlLatent heatTheory of relativityPosition operatorMultiplication sign
32:48
InformationRevision controlRepository (publishing)Repository (publishing)Source codeMathematical analysisMoment (mathematics)InformationRevision controlRule of inferenceComputer animation
33:47
Source codeGene clusterRevision controlDifferent (Kate Ryan album)Binary codeCASE <Informatik>Direction (geometry)Lecture/Conference
34:35
Demo (music)Virtual machineDistribution (mathematics)Demo (music)Point (geometry)AlgorithmMultiplication signConnectivity (graph theory)BitComputer animationLecture/Conference
35:29
Execution unitComponent-based software engineeringMaxima and minimaDefault (computer science)MIDIRevision controlCASE <Informatik>Source code
36:18
Component-based software engineeringExecution unitWechselseitige InformationElectronic data interchangeDefault (computer science)Inclusion mapMIDI8 (number)Revision controlPoint (geometry)BitNumberComputer animation
37:08
Component-based software engineeringArtificial neural networkElectronic data interchangeMIDIComputer fontBitRevision controlPoint (geometry)Human migrationConnectivity (graph theory)Distribution (mathematics)Algorithm
38:05
Distribution (mathematics)Kernel (computing)Installation artBitGroup actionPresentation of a groupInstallation artKernel (computing)Distribution (mathematics)Lecture/ConferenceXML
38:42
Distribution (mathematics)Kernel (computing)Installation artNumberVolume (thermodynamics)Social classRevision controlData structureWebsiteGraph (mathematics)Multiplication signFamilyBitInstallation artPresentation of a groupEmailDistribution (mathematics)
40:40
Multiplication signPosition operatorComputer fileUniverse (mathematics)Entire functionException handlingSineWebsiteGraph (mathematics)Source codeBitDistribution (mathematics)File formatPressureDifferent (Kate Ryan album)Logic synthesis
Transcript: English(auto-generated)
00:01
Hello everybody. My name is Pietro Abbate. I'm talking here for the Mancusi project. It's a project that recently concluded in about six months ago. It's a project about trying to manage the complexity of open source
00:21
distributions and in particular component-based systems. There are other people working with me in this project. One is Ralph Trianon, then Jaap Bonder, Stefano Carchiroli, Roberto Di Cosmo, Jean Bouillon, Yessim Boussade and other people in Paris.
00:44
So, why we come over here? The Mancusi project has been academically a really successful project. It has been very well rated by the European community and everybody clapped hands, and we're very happy about our job that we've done over there. It's a bit of...
01:05
But one main goal of this project was to reach on the developer community and to try to integrate our work in open source distributions and to try to make a difference. After all, this is a European project that has been founded with your money and
01:22
it would be nice to actually benefit about the results of this project. So, the Mancusi project finished about six months ago and we developed a bunch of tools that today I'm going to present. I'm not going to go in the specificity of every tool and every small algorithm. There is a lot of
01:46
academic blah blah published on our website. Today, what I'm going to present is just how you can use and how you can benefit from these tools. So, what we wanted to do was to
02:00
basically work on four different aspects of software distributions. One, we wanted to detect those packages that were broken, not installable. This is already part of an old project that was the predecessor of the Mancusi project that is called the ADOS project.
02:20
Probably many people here know about the utility ADOS list check that it's in Debian, Mandriva, and Kexamagic and other distributions. Another something novel tool that we developed recently was a package to detect those packages that need manual interventions.
02:42
Because their metadata is outdated and there is no other way of fixing this package than to change the metadata. So, this is basically a subset of the broken packages in the first category. Then we want to We wanted to give a step to try to to detect those packages that
03:01
are somehow important in the distributions. Just looking at the metadata, these packages are those packages that if change, they're going to break or affect a large number of other packages. And just by looking at the dependency web, this is a bit difficult. So, we try to actually
03:21
shed a bit of light on this problem and try to untangle this on this web. The last point is we wanted to actually see and guess what happens in the future if we modify if we upgrade a package to a newer version. So,
03:41
this is a kind of going a direction of predicting what is going to happen in the future when we upgrade just one component or a class of components. So, the first tool that I'm going to present is this check.
04:02
This check has a bit of history because it's been developed for the EDOS project back in 2005. And the first version was a standalone tool written by Jean Vuillon. Then, it's been integrated in the
04:23
suite DOS. It's an OCam library that is meant to study dependencies for software distributions. And nowadays, just on component-based systems, not just focusing on software distribution,
04:40
it's more general, even though here we look at software distribution in particular. In 2006, we uploaded a head-of-step check in many distributions and has been integrated in many QA tools so far. And then in 2009, we started the rewriting
05:03
those suite and as part of this this check. So, the future of this check it's just a tool to detect which packages cannot be installed. How we do it? We encode the entire distribution in a SAT solver.
05:21
We use a bit of logic and we use a specific SAT solver that has been written for this purpose that it's very fast and performant and allow us to answer this question in a matter of a few seconds. Yeah, it accepts different input formats. So, as I was saying, it's not just for one distribution,
05:41
but we can talk with Debian, RPM, and now we can also look at the Eclipse plugin system, that it's another component-based system. And other formalism that we we're going to add in the future. So, it handles compressed archives, so we don't have to bother about
06:03
B-zip or G-zip or anything else. It uses C-UDF. C-UDF is a generic format that tries to be one of these interchange formats for all the metadata in different distributions. What we do, we translate the different inputs in
06:22
in C-UDF and then we use C-UDF for the analysis. This allows us to be more modular and to to add backends as they come. One goal of this check was to be easily scriptable. So, we choose YAML as output format and
06:42
that it's a nice format that can be interpreted by a human and easily parsed by a machine. And at least, it's twice as fast as a spread processor and there is still a bit of room for improvement. This is very good. If you have ten
07:03
architectures and five releases, I don't know what you want to, not to spend half a day testing if your packages are broken or not. So, the output of this check more or less looks like this. You have a list of packages where you
07:23
you have a report for every broken package and at the end they give you the number of background pack, the number give you a few informations. These are the total number of packages that you analyze. These are the number of broken packages, packages that are not installable and
07:41
these two are the foreground and the background packages. As you can specify, for example, use, I don't know, about 20,000 binary packages, but I've analyzed only 10 of them and in this way, it's a nice way to select which package we want to focus on.
08:02
If we go deeper, I'll give you two examples of how a package can be broken. So in this case, a PTRPM repository is broken. We have different informations and the reason why this package is broken is because there is a dependency that is not satisfied, basically, because this package is not in
08:26
the archive at the moment. Another example that is a bit more complicated. It's an example of a conflict. So, we say that the gforge-web-apache package cannot be installed because there is a conflict between these two packages and on the other end
08:46
we have the chain of dependencies, because when you have a conflict between two packages, there might not be a conflict that is directed to the package that you're looking at, but it can be hidden deep in the dependency tree.
09:03
Our tool gives you how to reach that package and shows you that there is a conflict between this package that you're trying to analyze and the Python 2.6 minimal that is deep in the dependency tree. So, as I said, it's
09:25
easily scriptable, and if you want to load the output of this check, you can extract very easily all the information with a few lines of Python. And if you wonder, actually, it's useful. For example, this is the main core of Ubuntu,
09:46
where one package is not installable, because it depends on a package that is outside the core, and while the core is meant to be self-contained, it depends on FGLREX, that is a non-free component.
10:01
So, you can get somewhere you can have nice results. So, a demo. So, I've prepared a few comments, and I'm going to run a few of them. We do, just because they are nice and fast, and a few others are not going to show you
10:24
the result itself. Here, what I'm going to do is to run this check. I'm going to ask it to give me all the packages that are broken, to give me explanation, to tell me something, to show me the progress, and
10:41
I'm going to tell, to ask to analyze what's in my machine at the moment. And if I cut and paste this, and I run it, it takes about five, six seconds to do it.
11:00
At least, he did it this morning. So, what it does, I initialize the solver, and then I start spitting out all the packages that are broken. And as you might notice, today there are a lot of broken packages. There are about 300 broken packages in Debian unstable.
11:22
If you don't ask to print everything on screen, maybe it's a bit faster. Okay, I love to demo automatic tools because they don't require too much effort from me. So, what is used? It's used in an Embedian to check
11:44
in the toolchain of Embedian. It's internally used by the Mandriva QA team and the Caxon-Agica QA team. It's the engine of Debian Weather that you might have heard of, that it's maintained by our team. And I recently discovered that it's used in SimpleCDD, that it's a small utility to create custom Debian installation
12:07
CD, and in five conflicts, that it's another small tool written by us. So,
12:20
something else about problem detection, it's Debian BuildCheck. Now, when using this check, we're just busy looking only on the binary packages, but this problem can be extended also on the source packages, and we want to make sure that before actually trying to build a package,
12:42
we're going to be able to build a package. Otherwise, we're going to maybe set up a virtual machine and download all the packages, just to find out that there was one package missing, and all this time was just wasted. So, what we do is we encode the build dependencies as normal dependencies, and then we run the same
13:07
the same tool on this encoding. These allow us to discover if a package can be installed or not. It runs a bit faster. This is an old application as well from the Eidos project.
13:21
It runs a bit faster than Eidos BuildCheck. It's just a wrapper around Eidos DiskCheck, and has the same output of this check. That is YAML, etc, etc. So, an example of this tool. It's very similar as what I've shown you before.
13:43
Now, you see that certain packages have a small prefix that is source semicolon, the name of the package, and basically tells you that this is a source package, and what we have done is we try to see if all the build dependencies of this source package is actually
14:03
satisfied. And here we found that this package cannot be installed because in their build dependencies it depends on two packages that actually are in conflict with each other. This is one of the small problems that we can find
14:23
compiling packages. So, the second demo that is your might-have-guess is not going to be too exciting either. It's about this one.
14:40
So, here, oops, here I'm using the all the binary packages in Debian unstable, and trying to recompile all the source packages in Debian unstable, looking at only one architecture, and I want to know the
15:00
broken packages and have some kind of explanation from it. So now, this one is a bit slower because before we were analyzing about 30,000, 35,000 packages. Here we are jumping, and we are analyzing about, I think, about 50,000 packages or something. I forgot progress and
15:23
stuff. So we analyze the source file, and then we analyze the binary packages, and then we encode the
15:42
build dependencies. So, here we analyze about 36,000 packages, and there are a lot of broken packages up and down.
16:02
Outdated packages. So, the first two were two applications that were already known. It's kind of easy to figure out what they are doing because they were just looking at the broken packages. Outdated packages look for a subset of the broken packages that require manual intervention, and I'm going a bit
16:23
deeper here to explain what manual intervention means. So, a package is broken for basically two reasons. Either it's a transcendent problem, so it depends on the package that is not available, or it depends on a package that is not installable.
16:40
But we also have a non-transcend problem. These are the problems where the package itself needs to have this metadata fixed to to be installable again. Because maybe it depends on a package that is never going to be in the distribution
17:04
again, because this package has become outdated and a new version of the package has been already uploaded in this distribution. So, I have to change the metadata in order to fix this problem. So, I'm going to give you a couple of examples. So, for example, here
17:20
we're focusing on the package foo. If you look at foo, that depends on the package bar version 2. And we ask the question, is this going to be installable in the future? Well, the answer is clearly yes. We don't have to do anything on foo. We just need to upgrade the bar to version 2 and foo is going to be installable again. This often happens when we upgrade the new package that depends, for example, on a package
17:45
well, I'm talking in Debian experimental, and we need to wait for the new package to get in Debian unstable. So, there is a small delay and we don't have to do anything on anything to foo. We just need to wait for bar version 2 to appear.
18:06
If we change a bit the situation now, bar depends on a version that is less than less than 2. So, it doesn't matter what bar is going to become in the future. For example, version 3, version 4, version 5,
18:21
foo is never going to be installable because I need something that is smaller than 2. So, in this case, I need to change this metadata to to make sure that foo is going to be installable again. So, if we go in a more complex example,
18:41
so we look at foo and we have package bar and package bus. Now, the dependencies are a bit more complex and the question that we ask is always the same. Is foo going to be installable in the future? Surely, it's not installable now because it either depends on bus version
19:00
2.5 or bar version 2.3 and on bar and bus. But, in the future, we can kind of think of a situation where the package foo is going to be installed without requiring any any modification to the metadata foo itself. Because if we
19:21
go to a future where bus is of version 2.5 and without any conflict because in the future packages can can evolve in any way. So, we can imagine that the package bar in version 2.5 dropped the conflict. We are taking an over approximation here. Packages, sometimes they are a bit more conservative and
19:49
we get a version of bar of version in between 2.6 and 3, then this constraint here is going to be satisfied. So, this is another situation where even with a
20:05
complex constraint we can kind of figure out what is going to happen in the future. However, if we look at the situation like this, now the package foo requires my own intervention.
20:20
Because if we choose, now here we have two possibilities. Either we choose bar or we choose bus or we choose bar. If we choose bus of version 2.5, this is going to be, this is going to select this package that is going to be in conflict with bar 2.5 and this is going to be in conflict with this constraint up here.
20:42
If we choose bar 2.3 then this is not possible because if we choose bar 2.3, I have to choose something that it's less than 2.3 here, but in the repository I have version 2.5.
21:06
So, these are all these kind of strange situations where we can bump into. And we developed a tool to actually identify all these packages, mainly foo, and
21:25
this package needs its dependencies to be modified in order to be installable again. So, now So here, oops, here again
21:47
I'm going to run this this tool and this tool is going to take a bit longer. It basically uses the same engine of this check, but this time
22:05
since I have to consider all possible futures, all possible way of how these packages are going to evolve, I need to to consider a much larger universe, and in this case is going to be about 80,000 packages.
22:22
I also try to make sure that I consider the source clusters, so I don't let packages to advance one by one, but in clusters, because we don't want to end up in the situation where for example, I have a version of
22:43
GNOME 3 and a version of GNOME 2 in my distribution. I want either GNOME 2 or the cluster of GNOME 3 all together. This is going to take a bit of time.
23:10
We're going to come back to that. So, now I've been talking about these three tools that are meant to
23:22
to detect broken packages, to flag maintainers about broken packages, and to try to to release distributions where every package can be installed. Now we're going on something that it's more on the try to figure out how
23:46
the distribution is going to evolve, and what are the important packages, and how packages can evolve, and what they will break in the future if they evolve in a certain way. So, the first thing I want to do is about talking about this concept of strong dependencies. We already talked about it
24:06
I think about two years ago, and this kind of came back a while ago. I was at Linux Home for Australia, and one of the release managers of Ubuntu gave a presentation that was exactly about this.
24:22
So, I raised my hand and said we have all the solution of your problems. So, since this seems relevant to them, I wanted to come back to it and to try to explain what this is all about. So, the idea is to try to understand which packages are important for our distributions.
24:41
The notion of important is related that if I have a bug in a package, which other packages are going to be impacted by this bug. So, it's kind of trying to understand how a bug spreads in my distribution, or if I upgrade one package,
25:02
which other packages are going to be impacted by this upgrade. So, if I look only at the web of dependencies, this is a bit
25:20
too coarse, and it's a bit too complex for a human to analyze. What we wanted is wanted to have a certain measure of the packages that are going to be affected by a change. So, we define the notion of strong dependencies and
25:41
the notion of impact set. The impact set are those packages that are potentially affected by changes in a given package. So, it's a kind of a measure of how sensitive is this package and how careful we need to be when touching this package.
26:02
As I said, if we get the direct dependencies too little, if you get the transitive closure of all dependencies, it's too much. So, we want to need a stronger definition. The definition of
26:20
strong dependencies is actually very easy. We say that P is strong depending on Q, if it's not possible to install P without installing Q. So, let's consider this small graph. In here, we have packages that have either a disjunctive dependency or they have a direct dependency. Or, in this case,
26:46
there is a conflict between Izzy and George. And here at the bottom, we have Charlie and Doug that both depend on Fox. So, what we want to
27:02
do is to actually try to understand which are the important dependencies, which are the strong dependencies, and which are the all the others. So, in this case, George is a conjunctive dependency. So, in order to install ABLE, I must install George because I really need it.
27:25
But if I install George that is in conflict with Izzy, automatically these dependencies can never be satisfied. And this implies that Baker must be installed all the time. And at the same time, since ABLE
27:40
depends both on Charlie or on Doug and both of them depends on Fox. Fox must be installed all the time. So, we end up in a situation where ABLE in order to install ABLE, we always need Baker, George, and Fox. It doesn't matter what. So, this
28:01
gives us a way of trying to find out which are the packages that are really, really important. And then we kind of end up with a kind of a table where we can have this list of packages that
28:21
where we show the impact set of for each package. When we compare the impact set to the direct dependencies, actually, we found out that the direct dependencies didn't tell us a lot of things. As we in the first five, we have packages that maybe don't have that many direct dependencies,
28:46
but I have a lot of packages that depends on them. And so if we touch them that are really at the bottom of our web of dependencies, we are going to break a lot a lot of packages because these dependencies are going to be
29:02
propagated upward. And this gave us a nice notion to say if I migrate DPKG to a newer version, probably I'm going to break about 13,000 packages. So, at least these are very important packages. If I'm going to
29:21
upgrade this to a newer version, maybe I need to check that all the packages have strong dependence, strong dependence on them. They're going to be compatible with this migration, with this upgrade. Another
29:43
so this is on the strong dependency side. We also have kind of a notion of, that I'm not going to present today, but the notion of domination. So if if you look at the dependency, the web dependency of a distribution, you can kind of
30:03
think what if I run one of the the Google rank algorithm on this bank on these dependencies. This is going to give us the packages where most of most of the other packages depends, and it's going to give you this kind of hubs everywhere.
30:23
What we've done is we have used a similar notion of strong dependencies. So we have run a flow graph algorithm where we were able to find out the important packages, even if those were hidden under other packages. So imagine
30:42
there was, imagine if there is a very, very important package where everybody depends, but this one depends on another one. This means that actually there is one package hidden down there that is not detected by our previous analysis.
31:02
This Dominator algorithm allows us also to untangle these kind of hidden packages and try to find out a bit more about the structure. The last tool that I'm going to present is the challenge packages.
31:23
Here we're going to to try to predict what is going to happen if and since it's a prediction about the future, it's going to be an over-approximation. So I'm not going to go deep in how we did it. I don't have enough time today, but
31:45
what we wanted to do is we wanted to have a bit more than what we had with the strong dependencies. With the strong dependencies we had this notion of impact set and trying to understand what
32:01
which packages were important. Now we want to answer to this question. What if we upgrade the package P to a version of a version V to a future version? And how many packages are going to be affected by this upgrade? And in particular
32:22
the main problem is which future version I'm going to consider because I can't consider all the future versions. I mean the future is infinite. I have to try to find out specific versions that in relation with my
32:41
my repository today are going to be the most important tomorrow. So as I said, I wanted to do this, but only by looking at the current repository and I'm not going to use other kind of sources to predict how the packages are going to be upgraded and
33:01
I also want to be consistent with the clustering information. So packages are not upgraded by themselves in isolation, but are always upgraded in a cluster. So all packages that stem from the same source package, they're going to be upgraded all together. And
33:23
this challenge at the moment, it's very Debian specific because of this reason, because we did this clustering analysis on the Debian packages. I'm sure that it can be done also on the RPM packages. And the idea that in Debian packages of the same source
33:46
probably have the same version. This is not a rule anywhere, but if they have the same versions, they are going to evolve all together. If they don't have the same versions, then all bets are off because we can actually predict how these packages can move all together.
34:05
So what we found is that by analyzing source clusters, most of them, they move all together to a newer version. Some of them from a source package, you have different sub clusters, for example with different epochs. Some of them from a source package, they have binary packages with versions that evolve in completely different directions. And in this case,
34:28
we really can't say how different, how these packages and versions are going to evolve. And it's in isolation. So this challenge algorithm, it takes some time. It's not something that you want to do on your machine. It's something that
34:46
distributions want to run once a day. It takes about 35 minutes, 40 minutes, a bit too much for a demo here. But I can give you the idea of how packages are going to evolve. And this is particularly useful if you have point-wise distribution.
35:04
And you want to modify only specific packages. Then once in a while you want to run this algorithm and kind of to see, I'm here in this point in time with these packages and I want to
35:21
migrate this component of my distributions. What is going to happen when I do that? So as I said, we have source packages with versions and then we have target versions. The target version is this idea. What is going to happen when I upgrade to a version that is bigger than?
35:45
So in this case the first four, so these are for Debian Squeeze. And we see that in Debian Squeeze, one of the most problematic one package that
36:02
merge as problematic dependency-wise when upgraded. It means that the dependencies on the Perl package are very tight. It's this one, but it's this one for many different reasons. So if we upgrade from this version to a version that is in between
36:24
10.5, 10.2, 10.12, I break 2,652 packages. If I upgrade to a version that is in between here, I break as many. But if I upgrade to versions that is in between this one, then I upgrade, then I break a bit less.
36:45
This basically is telling me that the upgrade to specific version is going to break a number of packages, but not all the upgrades are the same.
37:04
So Python is also another one. I mean there are many different examples. If you get glibc, if you migrate to something that it's larger than
37:20
2.12, I'm going to break about 400 packages. If I migrate something that is in between this version and 2.12, I'm going to break a bit less packages and so on and so forth. This will give us a way of kind of peeking into the future and trying to understand what is going to happen if we migrate to a specific version. So when
37:42
actually going to change a component of your distribution and you run this algorithm, you might be careful where you're going to end up with your migration.
38:01
So one last tool that I'm going to present. Actually, I'm not going really to present as it's a bit outside the scope of this presentation. It's the Coinst tool that has been developed recently by one member of our group and
38:25
it's a tool to compute the installability kernel of distribution. What are the installability kernels? So if we we want to understand about how packages can be co-installable together. In particular, we find out that in Debian
38:43
about 80% of the packages can be installed together. If you want to find the the largest installation with the Debian packages, you probably can install in Debian unstable about 30,000 packages all together. But there are a number of packages that they can't coexist together.
39:04
And what this tool does is try to identify these classes of packages that can coexist together. So, for example, think about the mail agents. You can't install sendmail and POSFIX at the same time because the
39:24
the conflicts, the structure of conflicts doesn't allow you to do that. This is a simple example about two packages, or actually it's a family of packages that cannot coexist together. But if you look carefully at the distribution, there are not that many family of these packages that cannot coexist together.
39:47
And this tool just allows you to identify this class of packages and to have a simple graph to analyze what's how
40:02
the structure of conflict actually works in your distribution. These are all kind of hidden structures that we are trying to to let surface, to let to allow people to to look a bit a bit at the general picture.
40:21
So it's been written by Jean Bouillon, and if you're interested there is a website with a lot of explanations and very nice graphs. So my presentation finished here, I think it's about 45 minutes, and I'll be very happy to take questions.
40:55
Yeah, it's we, the question was about how deep we go in the dependency graph. We consider the entire dependency graph.
41:04
We consider the entire distribution, give me one package file that is our universe, and and we consider all the packages. And with our tool we, I think we analyze universes that are up to
41:23
100,000 packages, so I think we are in a good position to scale. The question was about how long does it take for 30,000 packages. It's about five seconds, 80,000 packages, it's about two minutes.
41:46
Thank you very much. Yes. Do any of the tools that you developed
42:03
have the capability of working with packages different than Debian's? Yeah, we work with so this check accepts Debian, HD list, synthesis file, cdf, that is this interchange format, and Eclipse file.
42:24
Once upon a time, I had also a filter for BSD, but it has not been rewritten. All these tools are, so every year we promise a release. This year actually we are very, very close.
42:49
We are so close that there is a release of the source code on our website and we are so close that there are packages in Debian experimental, and we are so close that Ralph is working right now to
43:07
finish up the Debian package that we should be up sometimes next week. So this is a way to put a bit of pressure on ourself because we say we're going to release these tools, and then we never do it. Thank you very much.