We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

QA tools for FOSS distributions

00:00

Formal Metadata

Title
QA tools for FOSS distributions
Title of Series
Number of Parts
84
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2012

Content Metadata

Subject Area
Genre
Abstract
FOSS distributions are increasingly over pressure to deliver stable releases including the most up to date upstream software. Time-based release strategies has exacerbated this problem putting even more pressure on QA teams. The recently concluded Mancoosi project has developed a number of tools to automatically analyse large packages collections for a range of problems, from installability checks to speculative analysis of software repositories. In this talk I'll present four command line tools to identify and correct potential problems as soon as possible during the release cycle.
5
Thumbnail
15:38
9
Thumbnail
49:09
14
Thumbnail
15:13
15
Thumbnail
11:24
27
47
73
Thumbnail
50:11
80
Thumbnail
54:40
Connectivity (graph theory)Distribution (mathematics)Projective planePhysical systemComplex (psychology)Optical disc driveOpen sourceComponent-based software engineeringLecture/Conference
Distribution (mathematics)Suite (music)CompilerSocial classLatent heatDistribution (mathematics)Projective planeDifferent (Kate Ryan album)AlgorithmOpen sourceResultantSoftware developerPresentation of a groupXML
Focus (optics)Suite (music)CompilerSocial classFamilyWeb 2.0Social classDistribution (mathematics)Category of beingPoint (geometry)Revision controlMetadataMathematicsConnectivity (graph theory)WebsiteProjective planeNumberDirection (geometry)BitSubsetMereologyElectronic mailing listUtility softwareSoftwareDifferent (Kate Ryan album)Computer animation
Revision controlDisintegrationComplete metric spaceState of matterSoftwareObject-oriented programmingTrailMereologyProjective planeBitLibrary (computing)Distribution (mathematics)Revision controlConnectivity (graph theory)Suite (music)Disk read-and-write headArrow of timePhysical systemObservational studyConditional-access moduleDynamic random-access memoryDensity of statesComputer animation
outputFile formatGeneric programmingFunction (mathematics)ParsingDisintegrationBitFile archiveroutputDifferent (Kate Ryan album)Physical systemFile formatLogicPlug-in (computing)Distribution (mathematics)Entire functionSet (mathematics)Formal grammarSatellite2 (number)Connectivity (graph theory)Data compressionBoolean satisfiability problemInstallation artLatent heatLecture/ConferenceXMLComputer animation
outputFile formatGeneric programmingFunction (mathematics)ParsingDisintegrationRepository (publishing)Revision controlSource codeDifferent (Kate Ryan album)MetadataBinary codeFront and back endsCoprocessorFunction (mathematics)outputBitTotal S.A.InformationFile formatNumberElectronic mailing listMathematical analysisRepository (publishing)Software testingMoment (mathematics)Revision controlTraffic reportingDistribution (mathematics)File archiverComputer architectureCASE <Informatik>Alpha (investment)Virtual machineComputer animation
Function (mathematics)Computer wormLimit (category theory)Chemical equationInclusion mapWeb 2.0Revision controlChainCompilation albumNetwork topologyPattern languageXML
Virtual machineFunction (mathematics)Flip-flop (electronics)Execution unitDemo (music)Function (mathematics)Core dumpStructural loadInformationLine (geometry)Demo (music)Graph coloringConnectivity (graph theory)ResultantXML
Function (mathematics)Branch (computer science)FingerprintMathematicsResultantArithmetic progressionVirtual machineMoment (mathematics)Source codeJSONComputer animation
Dialect2 (number)BitPoint (geometry)TouchscreenLetterpress printingSource codeJSON
Computer fontDemonBuildingProper mapChainDirection (geometry)LogicWater vaporRevision controlUtility softwareXML
BuildingLatent heatRevision controlFunction (mathematics)File formatMultiplication signSource codeVirtual machineCartesian coordinate systemBuildingWrapper (data mining)Normal (geometry)Projective planeFunction (mathematics)BitXMLComputer animation
Function (mathematics)CAN busWeb pageRevision controlArchitectureSource codeSource codeRevision controlProjective planeXMLComputer animationLecture/Conference
Demo (music)Duality (mathematics)Source codeDemo (music)Source codeStability theoryComputer architectureComputer animationSource codeJSON
FingerprintSource codeDemonArithmetic progressionBitSource codeSource codeComputer animation
Execution unitComputer fontCartesian coordinate systemSubsetArithmetic meanBitSource codeXMLComputer animation
Revision controlMetadataRevision controlOrder (biology)Distribution (mathematics)Level (video gaming)XMLLecture/ConferenceUML
Revision controlRevision controlBitLecture/ConferenceXML
Revision controlRevision controlMetadataCASE <Informatik>Bus (computing)BitXMLComputer animation
Revision controlRepository (publishing)Revision controlMetadataBus (computing)Order (biology)Constraint (mathematics)ApproximationBitRepository (publishing)Figurate numberDataflowPoint (geometry)FlagDemo (music)Drop (liquid)
Demo (music)Pay televisionParsingCASE <Informatik>Multiplication signComputer animationSource codeJSONXMLLecture/Conference
Source codeGene clusterRevision controlDistribution (mathematics)Multiplication signBitNetwork topologyComputer animation
Computer fontFlagDistribution (mathematics)Software maintenanceXMLLecture/Conference
Computer fontNormal (geometry)Finitary relationDirected setGroup actionAlgebraic closureBitMathematicsSoftware bugWeb 2.0Distribution (mathematics)MeasurementPresentation of a groupData managementSet (mathematics)WebsiteXMLSource code
Normal (geometry)Directed setGroup actionAlgebraic closureAlgebraic closureComputer animation
Normal (geometry)Finitary relationDirected setGraph (mathematics)CASE <Informatik>Greatest elementCoefficient of determinationRevision controlOrder (biology)UMLProgram flowchart
Level (video gaming)Sinc functionCoefficient of determinationMultiplication signOrder (biology)Revision controlProgram flowchart
Set (mathematics)Menu (computing)Maxima and minimaTable (information)ZeitdilatationTable (information)Electronic mailing listSet (mathematics)Web 2.0Greatest elementWebsiteHuman migrationRevision controlXML
Computer fontData structureRankingAlgorithmSimilarity (geometry)Web 2.0DataflowBitGoogolGraph (mathematics)Distribution (mathematics)Mathematical analysisLecture/Conference
Computer fontRevision controlMeta elementPredictabilityApproximationRepository (publishing)BitRevision controlLatent heatTheory of relativityPosition operatorMultiplication sign
InformationRevision controlRepository (publishing)Repository (publishing)Source codeMathematical analysisMoment (mathematics)InformationRevision controlRule of inferenceComputer animation
Source codeGene clusterRevision controlDifferent (Kate Ryan album)Binary codeCASE <Informatik>Direction (geometry)Lecture/Conference
Demo (music)Virtual machineDistribution (mathematics)Demo (music)Point (geometry)AlgorithmMultiplication signConnectivity (graph theory)BitComputer animationLecture/Conference
Execution unitComponent-based software engineeringMaxima and minimaDefault (computer science)MIDIRevision controlCASE <Informatik>Source code
Component-based software engineeringExecution unitWechselseitige InformationElectronic data interchangeDefault (computer science)Inclusion mapMIDI8 (number)Revision controlPoint (geometry)BitNumberComputer animation
Component-based software engineeringArtificial neural networkElectronic data interchangeMIDIComputer fontBitRevision controlPoint (geometry)Human migrationConnectivity (graph theory)Distribution (mathematics)Algorithm
Distribution (mathematics)Kernel (computing)Installation artBitGroup actionPresentation of a groupInstallation artKernel (computing)Distribution (mathematics)Lecture/ConferenceXML
Distribution (mathematics)Kernel (computing)Installation artNumberVolume (thermodynamics)Social classRevision controlData structureWebsiteGraph (mathematics)Multiplication signFamilyBitInstallation artPresentation of a groupEmailDistribution (mathematics)
Multiplication signPosition operatorComputer fileUniverse (mathematics)Entire functionException handlingSineWebsiteGraph (mathematics)Source codeBitDistribution (mathematics)File formatPressureDifferent (Kate Ryan album)Logic synthesis
Transcript: English(auto-generated)
Hello everybody. My name is Pietro Abbate. I'm talking here for the Mancusi project. It's a project that recently concluded in about six months ago. It's a project about trying to manage the complexity of open source
distributions and in particular component-based systems. There are other people working with me in this project. One is Ralph Trianon, then Jaap Bonder, Stefano Carchiroli, Roberto Di Cosmo, Jean Bouillon, Yessim Boussade and other people in Paris.
So, why we come over here? The Mancusi project has been academically a really successful project. It has been very well rated by the European community and everybody clapped hands, and we're very happy about our job that we've done over there. It's a bit of...
But one main goal of this project was to reach on the developer community and to try to integrate our work in open source distributions and to try to make a difference. After all, this is a European project that has been founded with your money and
it would be nice to actually benefit about the results of this project. So, the Mancusi project finished about six months ago and we developed a bunch of tools that today I'm going to present. I'm not going to go in the specificity of every tool and every small algorithm. There is a lot of
academic blah blah published on our website. Today, what I'm going to present is just how you can use and how you can benefit from these tools. So, what we wanted to do was to
basically work on four different aspects of software distributions. One, we wanted to detect those packages that were broken, not installable. This is already part of an old project that was the predecessor of the Mancusi project that is called the ADOS project.
Probably many people here know about the utility ADOS list check that it's in Debian, Mandriva, and Kexamagic and other distributions. Another something novel tool that we developed recently was a package to detect those packages that need manual interventions.
Because their metadata is outdated and there is no other way of fixing this package than to change the metadata. So, this is basically a subset of the broken packages in the first category. Then we want to We wanted to give a step to try to to detect those packages that
are somehow important in the distributions. Just looking at the metadata, these packages are those packages that if change, they're going to break or affect a large number of other packages. And just by looking at the dependency web, this is a bit difficult. So, we try to actually
shed a bit of light on this problem and try to untangle this on this web. The last point is we wanted to actually see and guess what happens in the future if we modify if we upgrade a package to a newer version. So,
this is a kind of going a direction of predicting what is going to happen in the future when we upgrade just one component or a class of components. So, the first tool that I'm going to present is this check.
This check has a bit of history because it's been developed for the EDOS project back in 2005. And the first version was a standalone tool written by Jean Vuillon. Then, it's been integrated in the
suite DOS. It's an OCam library that is meant to study dependencies for software distributions. And nowadays, just on component-based systems, not just focusing on software distribution,
it's more general, even though here we look at software distribution in particular. In 2006, we uploaded a head-of-step check in many distributions and has been integrated in many QA tools so far. And then in 2009, we started the rewriting
those suite and as part of this this check. So, the future of this check it's just a tool to detect which packages cannot be installed. How we do it? We encode the entire distribution in a SAT solver.
We use a bit of logic and we use a specific SAT solver that has been written for this purpose that it's very fast and performant and allow us to answer this question in a matter of a few seconds. Yeah, it accepts different input formats. So, as I was saying, it's not just for one distribution,
but we can talk with Debian, RPM, and now we can also look at the Eclipse plugin system, that it's another component-based system. And other formalism that we we're going to add in the future. So, it handles compressed archives, so we don't have to bother about
B-zip or G-zip or anything else. It uses C-UDF. C-UDF is a generic format that tries to be one of these interchange formats for all the metadata in different distributions. What we do, we translate the different inputs in
in C-UDF and then we use C-UDF for the analysis. This allows us to be more modular and to to add backends as they come. One goal of this check was to be easily scriptable. So, we choose YAML as output format and
that it's a nice format that can be interpreted by a human and easily parsed by a machine. And at least, it's twice as fast as a spread processor and there is still a bit of room for improvement. This is very good. If you have ten
architectures and five releases, I don't know what you want to, not to spend half a day testing if your packages are broken or not. So, the output of this check more or less looks like this. You have a list of packages where you
you have a report for every broken package and at the end they give you the number of background pack, the number give you a few informations. These are the total number of packages that you analyze. These are the number of broken packages, packages that are not installable and
these two are the foreground and the background packages. As you can specify, for example, use, I don't know, about 20,000 binary packages, but I've analyzed only 10 of them and in this way, it's a nice way to select which package we want to focus on.
If we go deeper, I'll give you two examples of how a package can be broken. So in this case, a PTRPM repository is broken. We have different informations and the reason why this package is broken is because there is a dependency that is not satisfied, basically, because this package is not in
the archive at the moment. Another example that is a bit more complicated. It's an example of a conflict. So, we say that the gforge-web-apache package cannot be installed because there is a conflict between these two packages and on the other end
we have the chain of dependencies, because when you have a conflict between two packages, there might not be a conflict that is directed to the package that you're looking at, but it can be hidden deep in the dependency tree.
Our tool gives you how to reach that package and shows you that there is a conflict between this package that you're trying to analyze and the Python 2.6 minimal that is deep in the dependency tree. So, as I said, it's
easily scriptable, and if you want to load the output of this check, you can extract very easily all the information with a few lines of Python. And if you wonder, actually, it's useful. For example, this is the main core of Ubuntu,
where one package is not installable, because it depends on a package that is outside the core, and while the core is meant to be self-contained, it depends on FGLREX, that is a non-free component.
So, you can get somewhere you can have nice results. So, a demo. So, I've prepared a few comments, and I'm going to run a few of them. We do, just because they are nice and fast, and a few others are not going to show you
the result itself. Here, what I'm going to do is to run this check. I'm going to ask it to give me all the packages that are broken, to give me explanation, to tell me something, to show me the progress, and
I'm going to tell, to ask to analyze what's in my machine at the moment. And if I cut and paste this, and I run it, it takes about five, six seconds to do it.
At least, he did it this morning. So, what it does, I initialize the solver, and then I start spitting out all the packages that are broken. And as you might notice, today there are a lot of broken packages. There are about 300 broken packages in Debian unstable.
If you don't ask to print everything on screen, maybe it's a bit faster. Okay, I love to demo automatic tools because they don't require too much effort from me. So, what is used? It's used in an Embedian to check
in the toolchain of Embedian. It's internally used by the Mandriva QA team and the Caxon-Agica QA team. It's the engine of Debian Weather that you might have heard of, that it's maintained by our team. And I recently discovered that it's used in SimpleCDD, that it's a small utility to create custom Debian installation
CD, and in five conflicts, that it's another small tool written by us. So,
something else about problem detection, it's Debian BuildCheck. Now, when using this check, we're just busy looking only on the binary packages, but this problem can be extended also on the source packages, and we want to make sure that before actually trying to build a package,
we're going to be able to build a package. Otherwise, we're going to maybe set up a virtual machine and download all the packages, just to find out that there was one package missing, and all this time was just wasted. So, what we do is we encode the build dependencies as normal dependencies, and then we run the same
the same tool on this encoding. These allow us to discover if a package can be installed or not. It runs a bit faster. This is an old application as well from the Eidos project.
It runs a bit faster than Eidos BuildCheck. It's just a wrapper around Eidos DiskCheck, and has the same output of this check. That is YAML, etc, etc. So, an example of this tool. It's very similar as what I've shown you before.
Now, you see that certain packages have a small prefix that is source semicolon, the name of the package, and basically tells you that this is a source package, and what we have done is we try to see if all the build dependencies of this source package is actually
satisfied. And here we found that this package cannot be installed because in their build dependencies it depends on two packages that actually are in conflict with each other. This is one of the small problems that we can find
compiling packages. So, the second demo that is your might-have-guess is not going to be too exciting either. It's about this one.
So, here, oops, here I'm using the all the binary packages in Debian unstable, and trying to recompile all the source packages in Debian unstable, looking at only one architecture, and I want to know the
broken packages and have some kind of explanation from it. So now, this one is a bit slower because before we were analyzing about 30,000, 35,000 packages. Here we are jumping, and we are analyzing about, I think, about 50,000 packages or something. I forgot progress and
stuff. So we analyze the source file, and then we analyze the binary packages, and then we encode the
build dependencies. So, here we analyze about 36,000 packages, and there are a lot of broken packages up and down.
Outdated packages. So, the first two were two applications that were already known. It's kind of easy to figure out what they are doing because they were just looking at the broken packages. Outdated packages look for a subset of the broken packages that require manual intervention, and I'm going a bit
deeper here to explain what manual intervention means. So, a package is broken for basically two reasons. Either it's a transcendent problem, so it depends on the package that is not available, or it depends on a package that is not installable.
But we also have a non-transcend problem. These are the problems where the package itself needs to have this metadata fixed to to be installable again. Because maybe it depends on a package that is never going to be in the distribution
again, because this package has become outdated and a new version of the package has been already uploaded in this distribution. So, I have to change the metadata in order to fix this problem. So, I'm going to give you a couple of examples. So, for example, here
we're focusing on the package foo. If you look at foo, that depends on the package bar version 2. And we ask the question, is this going to be installable in the future? Well, the answer is clearly yes. We don't have to do anything on foo. We just need to upgrade the bar to version 2 and foo is going to be installable again. This often happens when we upgrade the new package that depends, for example, on a package
well, I'm talking in Debian experimental, and we need to wait for the new package to get in Debian unstable. So, there is a small delay and we don't have to do anything on anything to foo. We just need to wait for bar version 2 to appear.
If we change a bit the situation now, bar depends on a version that is less than less than 2. So, it doesn't matter what bar is going to become in the future. For example, version 3, version 4, version 5,
foo is never going to be installable because I need something that is smaller than 2. So, in this case, I need to change this metadata to to make sure that foo is going to be installable again. So, if we go in a more complex example,
so we look at foo and we have package bar and package bus. Now, the dependencies are a bit more complex and the question that we ask is always the same. Is foo going to be installable in the future? Surely, it's not installable now because it either depends on bus version
2.5 or bar version 2.3 and on bar and bus. But, in the future, we can kind of think of a situation where the package foo is going to be installed without requiring any any modification to the metadata foo itself. Because if we
go to a future where bus is of version 2.5 and without any conflict because in the future packages can can evolve in any way. So, we can imagine that the package bar in version 2.5 dropped the conflict. We are taking an over approximation here. Packages, sometimes they are a bit more conservative and
we get a version of bar of version in between 2.6 and 3, then this constraint here is going to be satisfied. So, this is another situation where even with a
complex constraint we can kind of figure out what is going to happen in the future. However, if we look at the situation like this, now the package foo requires my own intervention.
Because if we choose, now here we have two possibilities. Either we choose bar or we choose bus or we choose bar. If we choose bus of version 2.5, this is going to be, this is going to select this package that is going to be in conflict with bar 2.5 and this is going to be in conflict with this constraint up here.
If we choose bar 2.3 then this is not possible because if we choose bar 2.3, I have to choose something that it's less than 2.3 here, but in the repository I have version 2.5.
So, these are all these kind of strange situations where we can bump into. And we developed a tool to actually identify all these packages, mainly foo, and
this package needs its dependencies to be modified in order to be installable again. So, now So here, oops, here again
I'm going to run this this tool and this tool is going to take a bit longer. It basically uses the same engine of this check, but this time
since I have to consider all possible futures, all possible way of how these packages are going to evolve, I need to to consider a much larger universe, and in this case is going to be about 80,000 packages.
I also try to make sure that I consider the source clusters, so I don't let packages to advance one by one, but in clusters, because we don't want to end up in the situation where for example, I have a version of
GNOME 3 and a version of GNOME 2 in my distribution. I want either GNOME 2 or the cluster of GNOME 3 all together. This is going to take a bit of time.
We're going to come back to that. So, now I've been talking about these three tools that are meant to
to detect broken packages, to flag maintainers about broken packages, and to try to to release distributions where every package can be installed. Now we're going on something that it's more on the try to figure out how
the distribution is going to evolve, and what are the important packages, and how packages can evolve, and what they will break in the future if they evolve in a certain way. So, the first thing I want to do is about talking about this concept of strong dependencies. We already talked about it
I think about two years ago, and this kind of came back a while ago. I was at Linux Home for Australia, and one of the release managers of Ubuntu gave a presentation that was exactly about this.
So, I raised my hand and said we have all the solution of your problems. So, since this seems relevant to them, I wanted to come back to it and to try to explain what this is all about. So, the idea is to try to understand which packages are important for our distributions.
The notion of important is related that if I have a bug in a package, which other packages are going to be impacted by this bug. So, it's kind of trying to understand how a bug spreads in my distribution, or if I upgrade one package,
which other packages are going to be impacted by this upgrade. So, if I look only at the web of dependencies, this is a bit
too coarse, and it's a bit too complex for a human to analyze. What we wanted is wanted to have a certain measure of the packages that are going to be affected by a change. So, we define the notion of strong dependencies and
the notion of impact set. The impact set are those packages that are potentially affected by changes in a given package. So, it's a kind of a measure of how sensitive is this package and how careful we need to be when touching this package.
As I said, if we get the direct dependencies too little, if you get the transitive closure of all dependencies, it's too much. So, we want to need a stronger definition. The definition of
strong dependencies is actually very easy. We say that P is strong depending on Q, if it's not possible to install P without installing Q. So, let's consider this small graph. In here, we have packages that have either a disjunctive dependency or they have a direct dependency. Or, in this case,
there is a conflict between Izzy and George. And here at the bottom, we have Charlie and Doug that both depend on Fox. So, what we want to
do is to actually try to understand which are the important dependencies, which are the strong dependencies, and which are the all the others. So, in this case, George is a conjunctive dependency. So, in order to install ABLE, I must install George because I really need it.
But if I install George that is in conflict with Izzy, automatically these dependencies can never be satisfied. And this implies that Baker must be installed all the time. And at the same time, since ABLE
depends both on Charlie or on Doug and both of them depends on Fox. Fox must be installed all the time. So, we end up in a situation where ABLE in order to install ABLE, we always need Baker, George, and Fox. It doesn't matter what. So, this
gives us a way of trying to find out which are the packages that are really, really important. And then we kind of end up with a kind of a table where we can have this list of packages that
where we show the impact set of for each package. When we compare the impact set to the direct dependencies, actually, we found out that the direct dependencies didn't tell us a lot of things. As we in the first five, we have packages that maybe don't have that many direct dependencies,
but I have a lot of packages that depends on them. And so if we touch them that are really at the bottom of our web of dependencies, we are going to break a lot a lot of packages because these dependencies are going to be
propagated upward. And this gave us a nice notion to say if I migrate DPKG to a newer version, probably I'm going to break about 13,000 packages. So, at least these are very important packages. If I'm going to
upgrade this to a newer version, maybe I need to check that all the packages have strong dependence, strong dependence on them. They're going to be compatible with this migration, with this upgrade. Another
so this is on the strong dependency side. We also have kind of a notion of, that I'm not going to present today, but the notion of domination. So if if you look at the dependency, the web dependency of a distribution, you can kind of
think what if I run one of the the Google rank algorithm on this bank on these dependencies. This is going to give us the packages where most of most of the other packages depends, and it's going to give you this kind of hubs everywhere.
What we've done is we have used a similar notion of strong dependencies. So we have run a flow graph algorithm where we were able to find out the important packages, even if those were hidden under other packages. So imagine
there was, imagine if there is a very, very important package where everybody depends, but this one depends on another one. This means that actually there is one package hidden down there that is not detected by our previous analysis.
This Dominator algorithm allows us also to untangle these kind of hidden packages and try to find out a bit more about the structure. The last tool that I'm going to present is the challenge packages.
Here we're going to to try to predict what is going to happen if and since it's a prediction about the future, it's going to be an over-approximation. So I'm not going to go deep in how we did it. I don't have enough time today, but
what we wanted to do is we wanted to have a bit more than what we had with the strong dependencies. With the strong dependencies we had this notion of impact set and trying to understand what
which packages were important. Now we want to answer to this question. What if we upgrade the package P to a version of a version V to a future version? And how many packages are going to be affected by this upgrade? And in particular
the main problem is which future version I'm going to consider because I can't consider all the future versions. I mean the future is infinite. I have to try to find out specific versions that in relation with my
my repository today are going to be the most important tomorrow. So as I said, I wanted to do this, but only by looking at the current repository and I'm not going to use other kind of sources to predict how the packages are going to be upgraded and
I also want to be consistent with the clustering information. So packages are not upgraded by themselves in isolation, but are always upgraded in a cluster. So all packages that stem from the same source package, they're going to be upgraded all together. And
this challenge at the moment, it's very Debian specific because of this reason, because we did this clustering analysis on the Debian packages. I'm sure that it can be done also on the RPM packages. And the idea that in Debian packages of the same source
probably have the same version. This is not a rule anywhere, but if they have the same versions, they are going to evolve all together. If they don't have the same versions, then all bets are off because we can actually predict how these packages can move all together.
So what we found is that by analyzing source clusters, most of them, they move all together to a newer version. Some of them from a source package, you have different sub clusters, for example with different epochs. Some of them from a source package, they have binary packages with versions that evolve in completely different directions. And in this case,
we really can't say how different, how these packages and versions are going to evolve. And it's in isolation. So this challenge algorithm, it takes some time. It's not something that you want to do on your machine. It's something that
distributions want to run once a day. It takes about 35 minutes, 40 minutes, a bit too much for a demo here. But I can give you the idea of how packages are going to evolve. And this is particularly useful if you have point-wise distribution.
And you want to modify only specific packages. Then once in a while you want to run this algorithm and kind of to see, I'm here in this point in time with these packages and I want to
migrate this component of my distributions. What is going to happen when I do that? So as I said, we have source packages with versions and then we have target versions. The target version is this idea. What is going to happen when I upgrade to a version that is bigger than?
So in this case the first four, so these are for Debian Squeeze. And we see that in Debian Squeeze, one of the most problematic one package that
merge as problematic dependency-wise when upgraded. It means that the dependencies on the Perl package are very tight. It's this one, but it's this one for many different reasons. So if we upgrade from this version to a version that is in between
10.5, 10.2, 10.12, I break 2,652 packages. If I upgrade to a version that is in between here, I break as many. But if I upgrade to versions that is in between this one, then I upgrade, then I break a bit less.
This basically is telling me that the upgrade to specific version is going to break a number of packages, but not all the upgrades are the same.
So Python is also another one. I mean there are many different examples. If you get glibc, if you migrate to something that it's larger than
2.12, I'm going to break about 400 packages. If I migrate something that is in between this version and 2.12, I'm going to break a bit less packages and so on and so forth. This will give us a way of kind of peeking into the future and trying to understand what is going to happen if we migrate to a specific version. So when
actually going to change a component of your distribution and you run this algorithm, you might be careful where you're going to end up with your migration.
So one last tool that I'm going to present. Actually, I'm not going really to present as it's a bit outside the scope of this presentation. It's the Coinst tool that has been developed recently by one member of our group and
it's a tool to compute the installability kernel of distribution. What are the installability kernels? So if we we want to understand about how packages can be co-installable together. In particular, we find out that in Debian
about 80% of the packages can be installed together. If you want to find the the largest installation with the Debian packages, you probably can install in Debian unstable about 30,000 packages all together. But there are a number of packages that they can't coexist together.
And what this tool does is try to identify these classes of packages that can coexist together. So, for example, think about the mail agents. You can't install sendmail and POSFIX at the same time because the
the conflicts, the structure of conflicts doesn't allow you to do that. This is a simple example about two packages, or actually it's a family of packages that cannot coexist together. But if you look carefully at the distribution, there are not that many family of these packages that cannot coexist together.
And this tool just allows you to identify this class of packages and to have a simple graph to analyze what's how
the structure of conflict actually works in your distribution. These are all kind of hidden structures that we are trying to to let surface, to let to allow people to to look a bit a bit at the general picture.
So it's been written by Jean Bouillon, and if you're interested there is a website with a lot of explanations and very nice graphs. So my presentation finished here, I think it's about 45 minutes, and I'll be very happy to take questions.
Yeah, it's we, the question was about how deep we go in the dependency graph. We consider the entire dependency graph.
We consider the entire distribution, give me one package file that is our universe, and and we consider all the packages. And with our tool we, I think we analyze universes that are up to
100,000 packages, so I think we are in a good position to scale. The question was about how long does it take for 30,000 packages. It's about five seconds, 80,000 packages, it's about two minutes.
Thank you very much. Yes. Do any of the tools that you developed
have the capability of working with packages different than Debian's? Yeah, we work with so this check accepts Debian, HD list, synthesis file, cdf, that is this interchange format, and Eclipse file.
Once upon a time, I had also a filter for BSD, but it has not been rewritten. All these tools are, so every year we promise a release. This year actually we are very, very close.
We are so close that there is a release of the source code on our website and we are so close that there are packages in Debian experimental, and we are so close that Ralph is working right now to
finish up the Debian package that we should be up sometimes next week. So this is a way to put a bit of pressure on ourself because we say we're going to release these tools, and then we never do it. Thank you very much.