We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Reproducible Builds - where do we want to go tomorrow?

00:00

Formal Metadata

Title
Reproducible Builds - where do we want to go tomorrow?
Title of Series
Number of Parts
47
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
We've made lots of progress, but we are still far from our goals of changing the (software) world A status report on Reproducible builds, which enable everyone to verify that a given binary is made from the source it is claimed to be made from, by enabling anyone to create bit by bit identical binaries.
37
System programmingSoftwareSineSoftware developerVideoconferencingBeer steinMathieu functionComputer animationLecture/Conference
System programmingSoftwareObservational studyOpen sourceSource codeBinary fileBinary fileOpen sourceFreewareShared memorySoftwareBitMessage passingSource codeWord
System programmingHypermediaPoint (geometry)Single-precision floating-point formatBuildingBinary fileSample (statistics)Read-only memoryCodeSharewareKernel (computing)Module (mathematics)Open sourceDemonComputer networkConnected spaceComputerMultiplicationSoftwareVector spaceBinary fileKernel (computing)InternetworkingSharewareState of matterCodeSemiconductor memoryNeuroinformatikPhysical systemBitDifferent (Kate Ryan album)Video gameMultiplication signPairwise comparisonModule (mathematics)Computer animation
Information securitySystem programmingPhysical systemSoftware development kitSoftwareCompilerAsynchronous Transfer ModeKernel (computing)Duality (mathematics)IntelSoftware developerLibrary (computing)AerodynamicsStructural loadExtension (kinesiology)outputCartesian coordinate systemGoodness of fitQuicksortSoftware developerSource codeCodeCausalityVulnerability (computing)Server (computing)
Physical systemIndependence (probability theory)System programmingBinary fileIdentical particlesOpen sourceSharewareRadio-frequency identificationIdentity managementBitOpen sourceBinary fileBuildingElectric generatorElectronic signatureMultiplication signResultantMereologySharewareComputer fileCausalityOrder (biology)Computer animation
System programmingNormed vector spaceMathematicsOpen sourceChainRadio-frequency identificationSoftwareDistribution (mathematics)Information securityDigital rights managementLink (knot theory)Arithmetic meanSoftwareFreewareNormal (geometry)Open sourceInformation securityLink (knot theory)Identity managementChainCodeXMLUMLComputer animation
System programmingLink (knot theory)Open sourceBinary fileChainSoftwareDigital rights managementRadio-frequency identificationStatisticsDistribution (mathematics)Information securityMereologyOpen sourceSource codeLink (knot theory)RandomizationBitInformation securityBinary fileSoftwareCausalityDifferent (Kate Ryan album)Computer animation
System programmingInformation securityGoogolComputer-generated imageryDirac delta functionRevision controlBinary fileSoftware bugResultantRepository (publishing)Error messageAreaSource codeMultiplication signDifferenz <Mathematik>Revision controlInformation securityOpen sourceDifferent (Kate Ryan album)Software testingSoftwareCuboidArrow of timeSound effectWordComputer animation
System programmingCore dumpMathematical analysisMoment (mathematics)Software developerClient (computing)Binary fileSelf-organizationType theoryBootingProjective planeMultiplication signBitRoutingCore dumpArithmetic progressionWeb browserMedical imagingBuildingComputer architectureVector spaceComputer animation
System programmingOpen sourcePhysical systemBuildingSoftware developerMusical ensembleSimilarity (geometry)Commitment schemeGreen's functionArithmetic progressionSymbol table1 (number)Multiplication signCASE <Informatik>NP-hardMereologyOpen sourceBinary fileSimilarity (geometry)Software developerCommitment schemeWeb browserProjective planeMoment (mathematics)UMLComputer animation
System programmingTheoryDistribution (mathematics)SoftwareBuildingBuildingMultiplication signHash functionComputer architectureWordComputer animation
System programmingRepository (publishing)CodeProbability density functionPairwise comparisonBinary fileFile formatFingerprintComputer fileWeb 2.0Web pageEmailElectronic mailing listBuildingPerpetual motionObject (grammar)File archiverDifferent (Kate Ryan album)Open sourceProbability density functionFunction (mathematics)Matching (graph theory)Distribution (mathematics)Forcing (mathematics)Computer fileRecursionMedical imagingXMLUMLComputer animation
Inclusion mapFunction (mathematics)System programmingInformation managementIdentical particlesSoftware testingResultantDifferent (Kate Ryan album)Hash functionSoftware testingXMLComputer animation
Software testingCodeAverageScripting languageFingerprintCloud computingPhysical systemBounded variationSoftware testingComputer hardwareWeightResultantStability theoryBitBuildingCausalityComputer animation
System programmingBounded variationDomain nameKernel (computing)Revision controlBefehlsprozessorFile systemSoftware testingComputer fileBefehlsprozessorBounded variationComputer hardwareType theoryMaxima and minimaTime zoneSet (mathematics)Physical system
System programmingTime zoneTimestampMultiplication signMessage passingFile archiverSound effectExecution unitComputer animation
HypermediaSystem programmingOpen sourceSineRandom numberSystem callAdhesionQuicksortMultiplication signOpen sourceTimestampState of matterSource codeMathematicsMedical imagingComputer animation
System programmingFingerprintPatch (Unix)TwitterState of matterSoftwareOcean currentOpen sourceTimestampLocal ringSoftware testingVirtual machineDirectory servicePatch (Unix)Message passingResultantAsynchronous Transfer ModeBinary fileBounded variationBitCodeDrop (liquid)Differenz <Mathematik>Flow separationMultiplication signSet (mathematics)Normal (geometry)Computer animation
System programmingVariable (mathematics)File systemInformationFunction (mathematics)Independence (probability theory)Bounded variationSoftware maintenanceSource codeMessage passingFile systemImplementationVariable (mathematics)Integrated development environmentProjective planeInformationBuildingFunction (mathematics)Patch (Unix)TheorySoftware testingDeterminismLatent heatRevision controlMoment (mathematics)Binary fileMultiplication signOpen sourceComputer programKey (cryptography)ParsingBitInformation privacyCASE <Informatik>Food energyMappingPhase transitionIndependence (probability theory)Physical systemComputer animation
System programmingBounded variationOpen sourceSet (mathematics)Software testingPoint (geometry)UsabilityBounded variationMereologyJava appletFile archiverSet (mathematics)Level (video gaming)DiagramComputer animation
Software testingSystem programmingIntegrated development environmentRevision controlOpen sourceBinary fileLatent heatComputer fileResultantMereologyInformationComputer fileRepository (publishing)Software testingBuildingIntegrated development environmentSocial classMultiplication signDistribution (mathematics)Moment (mathematics)Open source5 (number)CausalityBitComputer animation
System programmingOpen sourceProof theorySoftware bugNumberFerry CorstenProof theoryMathematicsCASE <Informatik>Source codeCodeOpen sourceComputer animation
Proof theoryOpen sourceSystem programmingRadio-frequency identificationBinary fileCodeDistributive propertyBinary fileBuildingComputer fileFile archiverInformationSoftware developerReading (process)PhysicsComputer animation
System programmingPhysical systemSource codeMathematicsCanonical ensembleSineOffice suite1 (number)Bus (computing)Multiplication signFreewareSoftware testingComputer animation
Physical systemSoftwareOpen sourceFreewareSystem programmingLogical constantSoftware testingNumberProjective planeBlogTraffic reportingCodeGoogolMappingComputer animation
WordSystem programmingSoftware testingGoogolSoftwareWindowBuildingPhysical systemMultiplication signConnected spaceProjective planeComputer configurationFreewareBitComputer animation
System programmingSoftwareOpen sourcePhysical systemFunction (mathematics)Software testingSoftwareState of matterBuildingVirtual machineArmData conversionResultantPatch (Unix)Software maintenanceSoftware repositoryDeterminismWebsiteUsabilityStreaming mediaStudent's t-testComputer animation
System programmingIntegrated development environmentCommitment schemeState of matterProcess (computing)Open sourceDisjunctive normal formPhysical systemSolid geometryFluidIntegrated development environmentCommitment schemeIdentity managementComputer fileBuildingOnline helpComputer animation
System programmingLogical constantSoftware testingCodeSoftware testingTraffic reportingComputer animation
Different (Kate Ryan album)System programmingMechanism designProjective planeInformationComputer fileBuildingDistribution (mathematics)Different (Kate Ryan album)BootingComputer animation
Cartesian closed categoryIdeal (ethics)Projective planeVirtual machineInformationComputer fileBuildingSign (mathematics)5 (number)QuicksortSoftware developerKeyboard shortcutProteinComputer animation
SoftwareSystem programmingDisintegrationSystem callSpecial unitary groupMatching (graph theory)Computer animation
System programmingSoftware developerSoftwarePhysical systemMathematicsSoftware testingDistribution (mathematics)Form (programming)Open sourceSoftwareSoftware developerState of matterForm (programming)ResultantBuildingMultiplication signComputer animation
Ideal (ethics)System programmingBuildingElectronic mailing listTwitterMultiplication signFlow separationEmailTwitterBuildingDerivation (linguistics)1 (number)Electronic mailing listComputer animation
Ideal (ethics)System programmingMessage sequence chartMoment (mathematics)ExistencePatch (Unix)Figurate numberIdentity managementComputer fileFlow separationQuicksortCore dumpDistribution (mathematics)BootingLibrary (computing)Binary fileClient (computing)Object (grammar)Different (Kate Ryan album)Virtual machineDigital Equipment CorporationSoftwareTwo-dimensional spaceCausalityCASE <Informatik>Sampling (statistics)Lecture/Conference
BuildingSystem programmingProjective planeCuboidIntegrated development environmentKernel (computing)Cone penetration testMathematical optimizationRun time (program lifecycle phase)Different (Kate Ryan album)BuildingSoftwareRevision controlEmailBefehlsprozessorMultiplication signSoftware bugComputer animation
System programmingXML
Transcript: English(auto-generated)
Okay, hi, my name is Hojva Leifsson, I talk about reproducible builds, the status and where we want to be, actually. So this is some blah about me, more important is that
these are all the people in Debian who worked on this, and I'm just one of them, and there are many more people outside of Debian also working on this. First, I'd like to know something a bit about you, who has seen a talk about reproducible builds already?
Okay, so half the audience or something. Who has contributed to these efforts? Okay, some people. And who has used reproducible builds as a user? In other words, who has reproduced something which they were using? So very few people, but a few.
So about the motivation for why to do this. Free software is great, we can modify it, share it, use it, pass it on. But that's all about source code, and we use binaries. And we need to believe that the binaries are coming from the source,
because there's no way to really be sure, you cannot prove it. And I don't want to believe that. I want to be sure, I want to know that's true. And I'll very briefly only explain the problem here now,
and this talk from three years ago from Mike Perry and Seth Schoen explains in great detail why reproducible builds are useful. I just have a few examples. So this CVE-2283 was a remote exploit in SSH-DE,
and the difference was one bit in the binary. The mistake was that it was an equals comparison, which should have been greater equals, and the difference is one bit out of 500 kilobytes. So if you just look at the bits, you will not see it. And they also had a live demo with a kernel module,
which modified the kernel in memory, but not on this. So if you inspect the code, the code looks correct, but if you build, compile the code, then it will compile something else. And also it's really hard to protect computers which are connected to the internet all the time,
especially if you have physical access, then you can modify stuff in memory and you cannot really protect yourself well. And how much do you pay your admin? So the easiest way, or one of the easiest ways is probably just to bribe somebody and subvert the system that way.
And also there's legal challenges. There could be a legal requirement that the state says you have to put this vector into the binary, says you are not allowed to do business here. And as said, this is in this other talk, very much better explained.
And there's also this white paper from a CIA conference where the CIA described how they would theoretically backdoor an SDK to compromise the code which is built with this SDK. And then in 2015, no, that's not 2015, X-Code Ghost.
2015 was when this was discovered, this paper. And in 2014 or 2015, there was this X-Code Ghost vulnerability where somebody backdoored a SDK for iOS and put this on server which were faster to reach from China.
So many Chinese developers downloaded that Trojan SDK and then there were 20 or 30 million compromised applications in the wild. And that was with good source code. And so our solution is that anyone can always independently
generate bit by bit identical binaries from a given source. That is what Reproducible Builds is about, this bit by bit identical. And so I used to have a demo where we build a Debian package five times and a year ago you would five times get a different checksum.
If you build it now, you get five times the same checksum. And that is really it. And we also say we include everything the build produces, also documentation and data files all in one should be reproducible because we just want to look at the results and not say we exclude these parts or these bits
and these are not important. We just say everything matters, everything should be identical. And this also works with RPM packages by now. So RPM has been fixed as well. And signed RPMs are a bit more complicated because you build the RPM, have one RPM,
then you attach a signature to it and put it in the RPM again. But even if you want to rebuild that, you just replay the same signature and put that in the RPM again and you get the same RPM. And the signature will match because the data does match. And we think this should become the norm.
So we really want to change the meaning of free software that it's only free software if it's reproducible. Like it's like a quality norm. It's all still free software but it's crappy free software if it's not reproducible. And surely it's just one link in the chain for secure software.
There's the whole software lifecycle management where you put the code, what code you write. All this stuff is also there if you want to write secure software. But it's a critical link because it links the sources to the binaries and vice versa.
And the problem with randomness is that you never can be sure. This old XKCD joke. The problem with reproducible builds is a bit different, that it's a lot of effort to only prove a very small part of this secure software part.
But it's still a critical part because all the other effort which you put in the source code are worthless because you can never be sure that the binaries you're running are really coming from the source. And there's more benefits than security. With our testing we found lots of subtech,
QA bugs where the software built differently, different locale, timing issues, whatever things. We discovered lots of strange errors. Google does reproducible builds to save time and money. They have everything in one big resource repository and it builds just faster, most results can be cached.
There's also smaller deltas, so there's smaller updates possible. I think Fedor does this. And there's also the side effect, there's a meaningful diff between two different source code versions. So if you only change one area of the source code, all the others should stay the same and you can better diff that.
And to start with the history a bit, in 2011 Bitcoin were the first who did reproducible builds. At that time Bitcoin was 4 billion, I think Bitcoin has now a market capitalization of 1,000 billion.
And they wanted to be sure that nobody can distribute binaries where they say this is from the Bitcoin developers and then there's a vector in it and takes all the Bitcoin away. So they wanted to be sure or ensure the users of Bitcoin their client is reproducible. Then toward it the same was their browser 2013.
In 2013 Debian also started but really we started in 2014 and this year we managed to get it into Debian policy. So Debian policy now says packages should be reproducible. And there's also 2014 was the core infrastructure initiative which at the moment pays my bills,
or I build them and that pays my bills. And other projects got involved. So FreeBSD, Coreboot, Leader, OpenSUSE, NetBSD all started involved that last year. And Tails this year just last week or the week before made their first Tails ISO image which is reproducible.
And then this year we also learned that Cygnus in 1992 released the GLUE toolchain for nine architectures in a bit by bit reproducible way. But everybody forgot about it. Like we worked on this since 2013 and only discovered it this year.
So I think what I hope, what we've achieved by now that nobody will forget about reproducible anymore. I'm not sure but maybe, hopefully. And so this is the progress in stretch. Green are the reproducible packages, orange the unreproducible one
and red the some more failing ones. So we have 94% reproducible and we got it in Debian policy, yay. And I call this now somewhat a misleading success because it's still a long time. We don't have the infrastructure lacking.
I will explain that in detail in a moment. And it will take probably till 2021 till Debian policy says packages must be reproducible and we'll get close to 100%. And 6% is still a lot if you're talking about 25,000 source packages. That's the, what is that?
3000 source packages or something. No less, it's 2,700. But anyway, there's still many packages. And it's also the Debian community, developer community really supports it but there's still some hard corner cases.
Like we cannot say packages must be reproducible and then we delay the next release for five years because there's 10 packages which are not reproducible. So that's still difficult. And also, I hope I'm wrong but I only see two other big or relevant projects
with similar commitment that's Tails and Torch. But for them, a small how-to is sufficient. Like if you want to reproduce the to a browser you build it this way and you get this binary at the end. If you rebuild Tails, you build it that way and get that binary. But for 25,000 packages, it's way more complicated.
You need infrastructure and stuff. And really this commitment of the projects is also the other part which I explain later. And then we are at 94% theoretically being able to do reproducible builds.
But we are lacking infrastructure to distribute all the hashes. Like 25,000 hashes multiplied with 10 architectures and users need to reproduce them, need to have tools and this is all missing. And Debian is the most advanced distro here. The others haven't even started.
So if you think reproducible builds will be there soon, yes, maybe if other communities do the same. And so we need to keep doing what we've been doing and we need to do more things and we need more people to join more communities.
And yeah, we made the first 90% and 90% of the time and then we may need more 90% of the time again for the last 10%. So what we've done. Now we have this web page, reproducible builds org
which has how-tos. There's a mailing list, IRC channels. We have common problems on this web page. We wrote Diffascope. Diffascope examines differences in depth recursively. So it will take a depth package and it will give it two objects to compare.
So it will take two depth objects then finds tar archive in there. In the tar archive, there's many files then there's a PDF in there which has an image in there. Go recursively and show the differences in the smallest object it will find. It does HTML or plain text output. It's available now in every major distribution.
It's also on PyPy. It works on BSD and it's really, really cool. If you haven't looked at it, give it a try. You can also just go to try.diffascope.org and upload two objects. It can be two RPMs, two ISOs, two text files, two anything.
And the result in HTML will roughly look like this where you see exactly what show where the difference is between two things. But Diffascope is just for debugging, to finding out why is something unreproducible.
If you want just to know if it's reproducible or not, then you just compare the hashes and that's it. And we built this test-reproducible builds org which is mostly testing Debian, all three releases, even also testing Stable.
We're doing this on four architectures, AMD64, i386, ARM64 and ARMHF. Those are the sponsors of the hardware. But we're also testing Leader, the NetBSDs. We did test Arch Linux and Fedora but the test bit rotted because nobody was looking at the results
and then we stopped doing them basically. And there's 40 people working on the setup. The Leader tests are well maintained. The Net and FreeBSD tests are nicely. And we apply variations there when we test. That is one thing. So the variations we apply,
we do the first build with these settings and the second build with the other. So we vary the time zone by more than a day. We vary the locale, the user ID, the file system, the CPU type if we can to do a maximum of variations so that we can test what will happen in the wild. Because in the wild, anybody will rebuild
and they have strange hardware so we try to make it most variation as we can. We think there will be more variation in the wild but we hope to catch most of them.
The common problem we found are timestamps. Timestamps, timestamps and timestamps. And time zones. Also really a lot of time zones. If you unzip a zip archive from 1980, your local time zone will be applied. So if you want to do this in your code,
you need to first normalize the time zone and then unzip it. Same with locales. The build pass is embedded and there's lots of small issues which affect only five or ten packages. Luna gave a talk at the CCC camp in 2015
where he gave many examples how to avoid that. So call gzip with minus n and other common things you need to do. And we came up with source state epoch which is defined as the last modification of the source code.
Because sometimes it is useful to include timestamp in there just not meaningless timestamps as the build time but rather the source code modification because that doesn't change. That is deterministic and meaningful. And in Debian we define it from the last Debian change log entry.
In RPM it could be spec files, whatever. And source state epoch has now been adopted by dpackage, by RPM. GCC supports it. Lots of tools. I think it's 40 tools or more which support source state epoch and will replace the current date with source state epoch
if you build software with it. And we wrote two more tools. Drip non-determinism removes some known useless timestamps from PNGs and other stuff which normalize it. And Reprotest.
Reprotest, this is a tool which does what this Jenkins test set up on your local machine to use Reprotest to build something locally. It will apply variations and then hopefully it will be the same. And Reprotest now also has a mode where you can
use these variations and then lower the amount of variations to find the variation which is causing the unreproducibility. So it will do several builds and this different variation and you can see, okay, if I vary this then my build is different so I would need to look on the code which is causing this.
So please do give Reprotest a try, especially if you're not using Debian. We really wanted to work everywhere on PSD, on other Linuxes, on macOS. Please give it a try. So the Debian starters. Let's first start with Golang because that's shorter
and this is a Golang conference after all. So Golang binaries are bit by bit reproducible which is yay but when the build pass is varied they're not. And that is quite common problem which is also very easy fix.
You just rebuild in the same directory and then you get it. But Michael Staupeberg also wrote a patch for Golang which is this one which is in Debian main where you can vary the build pass and the result would still be the same. And for this we came up with in the second specification
this build pass prefacemap and this specification describes the environment variable for build tools to exchange information about the build time file system layout to generate reproducible output where all embedded paths are independent of the layout.
And that is nice in theory and our biggest problem at the moment is we have a patch for GCC because GCC also embeds it and the GCC maintainers are not happy with the implementation and we are discussing with them and that is the problem. The problem with this build pass thing is also the workaround is so simple.
Just rebuild in the deterministic pass but we want users to enable to rebuild and also I think it is why should the pass be embedded in the binary at all? It should not be there. It can also have privacy amplification.
If you build in home projects blah secret project you don't want to have that leaked into the binary. And so when we test Debian unstable we vary the build pass so we have worse result in Debian unstable while when we build Debian testing we don't vary it because if we want to have reproducible Debian now or in two years
we will just say build in a deterministic pass TMP source code source and package name minus version or something. So this is Debian unstable and at this point we introduced the build pass variation and there we went from 90 to 70 percent reproducibility.
By now we've catched up like we are now at 86 percent again while in February this year we were at 78 percent so we fixed lots of things already. One other thing we have for Debian you can just go to this URL
and see the package status. So for all 25 000 packages you will see the status here. We also have 49 package sets like build essential the base packages all KDE packages or Java packages or whatever because if you just want to look at a small part of the archive.
And we have some nodes so we it's a simple yaml repository where we take notes about certain issue classes and packages affected by it. It's we have over 6 000 nodes now and we want to do this
at the moment it's Debian only but we want to do it cross distro because many issues are the same in different distributions. And one other thing we came up which is central to our concept
are these build info files. Build info files describe the sources to check some of the sources or the dependencies needed the environment to recreate it and the result. And so the idea is that user can take a build info file has all the information needed to recreate the sources.
And this part we have defined and working what we the infrastructure we're lacking is the infrastructure to distribute these build info files. And for other distros non Debian this is not as clearly defined because our test in our test we always just rebuild at the same time
so the build environment is basically the same using a build info file you can recreate the same build environment. Um yeah we've also filed over 2 000 bugs with reproducible issues in Debian. I don't know how many of them went upstream I hope one third
but I lack the exact numbers. So as I said in the beginning this is oh and this is also just the proof of concept for the stretch case. Um all the changes are in stretch.
The source code of stretch is 94% reproducible but because of the way Debian releases we don't pull archive rebuilds only maybe 20% of the stretch binaries are reproducible. That's really a Debian problem that is nothing to worry about you for you if you're not into Debian.
And the other problem we have we don't distribute this build info files yet they are only accessible for Debian developers. So this is what I said we are there theoretically but not in practice. But in practice other parties canonical could take stretch or unstable and rebuild it and release Ubuntu which would be 94% reproducible if they rebuild everything.
Debian 10 buster will be partly reproducible in 2019 so the next release which is still some time. And yeah I said about policy that packages should be reproducible.
We hope that for the release after the next ones for Bullseye it's called in 2021 we'll have Debian policy say packages must be reproducible. And even then if whatever leap for office is reproducible but as an example if leap for office is not reproducible I guess we will release
with unreproducible leap for office because we need it. And it will not be leap for office but rather whatever some other important packages. There's 200 key packages which are unreproducible and they still work there. So yeah by now it's pretty obvious that we there's many people
from Debian in this project but we care about free software in general. So we write weekly reports every week a blog post. We're at number 130 now. We made or we made two summits so far where people from 25 projects meet and discuss for three days and do brainstorming roadmaps.
We'll have another one in two weeks in Berlin. If you want to join please talk to me. It's from Tuesday to Thursday. And we do Google Summer of Code and outreach projects where you usually mentor people. The status of the non-Nebian world.
I will skip the BSDs Arch Linux. Eftroid is also interesting. Leader I will not mention them much more. The funny thing is NetBSD and FreeBSD. FreeBSD was for a long time at 99% reproducible their base system. And then NetBSD first reach 100% that was really funny.
But that's only for the base system not the port system. Yeah there's other projects. Google Bazel is a build tool from Google which aims at reproducible builds. There's Doosable build tool for Windows so you can do reproducible builds for Windows.
Which is a small detour commercial reproducible software. We have medical devices in our body arms. We have nuclear power plants. They all run crappy software. Nobody knows what's in there. But for gambling machines the state in Germany and France demands reproducible builds.
Value at a tax. Anyway. So Bernard Wiedemann started with reproducible SUSE in 2016. And these are his results from 1st of October. And he didn't give percentage but that's also 93.7 or something percentage
of the SUSE packages are reproducible. And these are his main sources of undeterminism. Java, Doc, LaTeX, Mono and Qt. So all documentation basically. And we haven't included his SUSE results into the website yet.
But we want to do this so that it's easier to compare. And Bernard also created this Disso patches get repo where he's actively sending patches upstream. He actually has been looking at many Debian packages where the Debian maintainers didn't send them upstream. He sent them upstream.
And we joined him out there. And so in RPM in general, RPMs respect source state epoch. Human DNF can be used to recreate environments. There's DiffoScope. And the science RPM thing is also solved.
So the technological foundations are there. But and Bernard is there. Bernard is really doing an awesome job in SUSE. The problem is also Bernard is only the one of the few people in the SUSE or Fedora world who's working on reproducible builds. So please help Bernard.
And so yeah there's not no or not wide community commitment to that or management commitment. There's no build info files. No tools to use them. Of course there's no user tooling yet.
And this is not limited to the RPM world actually. That is most everywhere. Debian also has no user tooling. Debian has community commitment but no user tooling. That's what we're waiting for. So far we've mostly worked on making reproducible builds possible.
But we need to do constant tests in the future because every new release can introduce new unreprosibilities. So we need to constantly test that and find that. And we need tools, infrastructure and policies to become meaningful and used in practice
so that users can really verify that what we're saying is true. So we want this distributing these build info files. And we want people to enable to do rebuilds so that they need the checksum. So we need to distribute the checksum
and there's not really much work done on it. And it will be different for different projects because we have different distribution mechanisms that will change. And then we don't really know who should sign these build info files.
Individual developers? Or do we want to have big rebuilders like the CCC, the NASA or the NSA, Deutsche Bank, the Russian army. And you can pick whom do you trust when they rebuild it.
This is all not sold out. And somebody needs to do something on this. Maybe I will really go to the CCC and ask them, hey, can you set up a machine and rebuild Debian? Because I want to do it outside of the project. And then we need user tools. Do you really want to install this unreproducible software?
And do you want to rebuild it before you install it? Nobody has done it. Maybe you can rebuild it and then if it matches, then install it. And how many checksums do you need to call a package reproducible for you? And what do you do if one doesn't match?
Maybe the Russian army or the NSA wants to subvert you. What do you do then? If you want to get involved as a software developer, stop using build dates, please. Use source date epoch. Really, that's it.
Attend the summit in Berlin if you want to. Form your reproducible builds team. It's really fun. You learn a lot of things because you look a lot of software. And the best way to get started is just build something twice,
look at the results with Differscope, and then try to fix it. Yeah, is there still time for questions? These are the resources.
We have two IRC channels, actually. It's Reproducible Builds and Debian Reproducible. You can go to any of these. We have mailing lists, also general ones, and there's several others. We have a Twitter feed and these are these two talks, which are also really recommended to watch.
You have questions? You talked about the lack of user tooling. How do you see that? Is that something which should be integrated in the existing package managers or separate clients, other software?
What do you think is missing there? I think it should be in the existing tools. For Debian, do you really want to install this? We have a patch for this for Upt, but we lack patches for DNF and for other things. Also for the BSD, it's unclear how to do that. Leader is reproducible, core boot is reproducible,
but all the user tools to just simply do that don't exist. So it's all theoretical at the moment. Three years ago, nobody believed it. Many people didn't believe it was possible. Now, many people think it's possible, but to really verify it, you need to do a lot of manual steps.
Not many manual steps, but some. If you want thousands of packages installed on a machine, then I need to do many manual steps. Quick question.
Are the checksums per package or per artifact within the package? Will different files in the package have their own individual checksums or is there one single checksum for the whole package? It's in the Debian case, it's one checksum per binary package. Okay. Was there any consideration of checksums per file in the package?
No, because we really want the whole, because the package can consist of 10,000 of files and we don't want to say these files don't matter and these do matter, then you need to evaluate what matters and why. We just say the whole thing needs to match. Yeah, my reasoning for that was that if there were checksumming per file,
so let's say the binary had one checksum and then the documentation files, they had their own checksums, then maybe you could share them between distros and sort of consolidate the effort so that not every distro has to rebuild their own tooling and figure out the checksum distribution and so on.
The binaries will differ between the different distributions because there's different libraries used and it doesn't work. Just a thought.
If you want to talk to me later, that's difficult because I will leave the conference sadly here for one more hour. So either ask me now or use email or IRC.
Just curious, have you seen the kernel version make any difference in the build outputs? Very rarely. It should not happen, but it sometimes does happen. For example, if there's new kernel features, then it's of course happening. Sometimes software just embeds the kernel version.
Or like they write a lock and then they include the lock in the artifact. Then the kernel version is there. Have you reached out to the Yocto project because they try to isolate their build environments for every package that they build
and I think they might be interested in getting reproducibility into that. Yeah, we're in contact with them. About the kernel thing is we often find problems where it does matter and then it's a bug. Like when there's build time optimization of the code, that's usually a bug because you want to have runtime detection
of the CPU of the features. Okay, thank you.