Reproducible Builds for Debian

Reproducible Builds for Debian
How can we enable multiple parties to verify that a binary package has been produced untampered from a given source in a distribution like Debian? With free software, anyone can inspect the source code for malicious flaws. But most distributions provide binary packages to their users. We would like them to be able to verify that no flaws are introduced during the build process. The idea of "deterministic" or "reproducible" builds is to enable anyone to reproduce a byte-for-byte identical binary packages from a given source. A research effort started last summer towards reproducible builds for Debian. After several small tweaks to core Debian tools, a massive rebuild in September reached 24% of builds resulting in identical binaries out of 5000+ source packages. The process uncovered challenges about both the reproducibility of the build environment and about the build processes themselves. We will review them, along with possible solutions and what remains to be done
but going bank of the spot and this is the distribution of the group think things you want enabling the removal of this problem so I not Our using the quality of the generated in the locker in 2007 part of these days I'm very active in the holiday and the meaning of that balance and all the things you want to do you today I'm going to talk about you know these people feel about the the the nano and that's what I did it right but it's also a huge project that would need operation from political distributions and the others are willing to to jump into 1 of the most what happened so what are what are reproducible How would for the deals with the idea is that you to pull people using different computers in different places so cute you problem saying so that each on the real part this and get it back exactly the same battery packages of this book divided into article by saying that text and why do we want to do that we want I mean my main focus on that you couldn't targeted at the top of the way the way in which the you are malware on 1 computer and on 1 million to give you that 1 computer is the 1 of the the proper during build all of the did you even but then you get you get our window waking that malware into a package that we'll be sold in many many many different boxes like this then you know that all participation division based you and it's going to be to you actually figure that there is a model that is really when you are acutely ordered we do that I'm part the source was with especially given that we do we recover and we have good so we are all the part of the the actual binary and prove that the binary matches the most part we think part is reported in the field we can get some of the people at the utterance that actually that that the only thing that I love to provide themselves and is also interestingly like some some new sometimes you want your you have a reproducible we'll use that if the 1st time RealTravel forgot through our our you know like include the signals into the battery package then you can get being built after a while and that throughout the value of the assets of the initial to and also for the i this is useful because the motor off I don't fall fall into the American going full packages that on different from different architectures of but you need to have the all that of energy that is being there were also of interest fall 4 of them for me this this is my you reproduce the wheels by river actually what the what the meaning that might might vary in size of the project the which is also what that being lied non-neutral part of that have been calling me might me going people on it you have reproducible we'll be here that they were being lied money and people were I you to know that uh so good there seeing where matching and so my period build on that uh and uh we know all about project
each of the top of bundle which is the main you wait used for the field in a reversible
manner this means that we have multiple the lockers being different area of holding the same old thing puncturing the result that this is not a new idea it's all the new idea of while at the start of the project I get any help from the time of you our we said that it was only like yeah I actually we had this discussion in 2007 Lexington ago and in that year how much she was you unintelligible we didn't have this year that he but he suggested that we should actually that have of the integral over permissible deals and then for reaction was not stupid suggested glide you Williams for example didn't realize and we were you had to manage our who said you like you you you you you so yeah but so how well aware that an electrical and yet the we might on the topological we need that for the year and then the last thing is that that the the inference in this with the letter and we made up of the future I small meaning and how active right like for the people all came know and all interested in future projects and so we discussed what was the right approach and how we could do you think that the very problem and it was like 45 minutes we about I will in that you are actually the cost that we engage well I we we have all of the product that is there I mean being following something happening something what that and I think that a word like from 2007 well and people variant adapted from Latin here's suggestions to you that's what it was like the the I think think that seems to have the need for political will so actually 3 things need get a simple on the very 1st thing that you recall the Miller and are we the idea that you we called our like the compiler of the rest of this is that you you are recalled what was used initially to from the then when you want to reproduce you have awaits you actually read that initial development environment In the last part of that you have to build systems that do not catch years think that it should not capture from the environment that you do not have variations that not like side to the source of the recording the building the actually funding and we don't have a problem for where have we have already idea on how to do there is some people are this building program which actually true recall the environment that provide information binary packages in light of what it is but I mean the problem is not really hotter than a matter of discussing where the information that was like the mountains are reporting to build environment is also I mean we still need some more and you have actually a proper street I actually on field that but we have so this that is called a snapshot of regional research we also need uh we have an archive of every single battery packages is that there were around through the archives for several years so if the real environment requires this version of this you can get it from that that the idea that the question is how do we need to store the name of the package in a version of what we need to store the actual binary of the latter part of what you just need to solve the direction and the name of the package because we have that and you could also we want people to be able to reproduce it will change the in the army and then you have this thing that is more like a hole in the various like variations that coming inside like from the build system that of themselves so thanks there's not captured by the current time and we'll have the tolerant will called many different things and I think that's the mold are to to to quickly right around the time of the fire them all the time the traditional brought by fall is all the time that of way that I did was made why they need you so we can we can get that we need to get the fact that they did not publicize their top people of height is almost like part of the job they also also signed up which was of time is not useful information and
and the like the 2nd time when we try to reproduce then we have the entirety of the period with the environment we have right in the nineties and Javadoc right stands there belonging to the comments that is that is hidden in the interest of time so if you want the proper really at all know there's not time yet to find convincing our use case fostering at the time of the the of being the a binary package it followed that not what you're interested and you need to know which version of the software which like that either no reason to think he had to be a part of the time I can build an old version now and then I get like a new interval the the you can get 2 questions later utterance pair him on also wanting to accommodate things fall in that some don't use them on things that don't matter which where which directory that will happen and 1 of the creators that all thoughts like the symbol for all of the files you know and and they have like this still like to go to go you do all your name and here it's in there that include the bill that we should also like intervals variation in the final again we don't need that information when you have a battery package source is lost and that no point entering an arbitrary power being divided I want think in the final part of the problem so when you read data on directory sometimes you really like ABC what sometimes you can get the depending on the all the the initially they have been written all the houses in there all so the sum of the thing that like for example you know that although not from the this this directory and then you get a different answer by because God had all the all the only thing for all that users are for which this data's like you just I thought the would for instance that I have think we will call it art for example if you if you are like in this field goal there you have an accented letters like French things than that but in the end and you get the 1st local the OK so I knew I have to part the old so that's a variation of energy here but it was part of that some people think that and that in the light of day you name it through the name of of the of the problem 1 way to solve all the variations the part this is the way we do follow that of other people are and I j in overlay becoming and other projects using the when you do this you have to know the and are you introducing can all use the same username you think don't that aren't using name might all these regions by using the him and the other thing you can use the school leaving time and that time is algae fuel library that we like to call to get time of day on the became likes to promote it our recently you get you know that's the last edition was Europe where you can actually we call that the file I where we've every single get of the initial could and there are like this and that itself and then we played at the next time so I mean you can see that works but we decided at the bottom of the back of that we we take the novel that you with the and we try to do things right and I think that the idea that we're going to patch regained you computed the toolchain in capsules in the of life we should add logical option that would be like that of men at the time that maybe and you had spiritual systems that they were not meant as an option to do that we not because of all these little changes that we made our packages was reproducibility we've of having the need to you the EM that you can use any of deviance there and just run script like you always decades and reported our reading so have result with some experience uh I'm using you in this is from of the things which is usually used to you allow to scale up here I provide a real life to try it out in the last little that well known when you try to remove ceiling I tried really building which the decision to use so we had this is not 1st sector I know it's not unlikely that the bone and also the dates went in by that it's for which would
be helpful but in are and so the idea is that we're going to be on and then revealed the reason that it in the our we we get Lorettalorna part of how we do that we do so we know that it's like it there and we'll talk that the sum that field which means we set of pictures with microsoft word we sell the build and then we do that again like reception center at the of various forms of the only small difference we do between the 2 real this that we pass the time spent all of those fields here are the PDG through an environmental I'm just to say OK but that it was real that that in the sense that in this that we have to write it turns out of the data that we have time and we have deal that will not different from the from field to the power the changing of the meaning of the of the book but it's still life time is the goal of things like the harvest from i've seen so far and to perform the experiments we did would like we free if those packages in the true what is DTG there we get a single time so far every file in the occupied part that of like having multiple calls to get out there we only have like a single time for every of and we reuse the very same time there are these things coming from the Roman viable dpkg I will also invite you to its solving the current of all of the world we always it that there was a change to get on the cover which you know we define the history of this should be a part of the novel that takes care of the home and we used to get to the main goal of the past that you get think including throughout the book symbol and then we also need to change because the data is not at told you what we've indirect strings what follows a power but the you get in the cell we have another option is to that your the signifier and we also have the notes of which has been revealed trees at your option which is management is enabled deterministic archives which means that they dog is going to drop the right yeah that only true but it is pretty small head and we so I mean that's the thing is that they are writing but like I say which was yesterday evening we had like on 5 of 5 of them so that it is a 3 thousand 196 out of the I think we're interested by the 1 where actually and for those who don't realize that 52 per cent on the 1st try with it was 24 actually believe that the laws of so that a steady was thinking for the ball I you know the on some of their life for those who are like so that they properly have financial you there is about into the power and all the factors that I tried to sort out you know which 1 like why there was some it's spelling out of there that was from 10 name to midnight yesterday sorry for the for all things that and the money that the all ideologies match which means some problem with the field that that gets included in view of the simple 1 where ever did it was written by year that initially there I mean that's really room for cooperation because this is not a specific problem you know I that so on knowing that it was only right of that there were like more than 100 on top of which so times in that John the use of the advantages I have also and you got the right the right way that pH the packages we had like that and then in embedded in the registry I think we should just get rid of that there was some room documents and user probably times that's also in the conditions of our some weird thing with the title and what the price of sometimes with the the variance in different orders but those things the things you think are some of the language in the contagion that so that I
don't know how to deal with you on there are like some that will also times that of men it also find them on the top of the problem of all of this with a view I like beginning that we control the form not that much more which you 2 % you made the policies that that's what I when I the that goes I mean with the issues that I did that are not for not the this is the thing that other distributions were also interested in that are there was a blog on the on the blog posts in the security logs from that you got you what is your bills for figure out how I know that you that something that is called jogan period that is being moved hold of some kind of a prediction of interest interacting with the jails and needs so as I also very very interesting you doing that in the limit so as suggest to you that I and I don't write them all distributions we get interested in that began like in the stream you quite ample removed and that's from archives your life to show that using different views of the of the of the of the of that's about thank you a microphone here so we're going to do any questions or Professor Mike around so that the you know 1 video of fruit improvement you know so there's a 1st question the early lead new unique so you're not 1 question what so please check the wiki page talks about prior periods about that also made the surprise of that is so of the beginning of to produce and use of sorts of compromising models is jury to get some of whom served by the race and so so this issue were operating the 2 of to some of the event but whose meetings this year because of a single view program was also provided that we don't have here that's that's why it's harder than to do something with them so it probably in the to avoid it that way harder to get into fashion what what what think certificate and that the the so so how confident are you that our own any compromised attitudes in snapshot that at some point hopefully will be able to adjust the bootstrap everything but that's that's a different issue I mean we need at some point you're going to have the principal will be all of them this walking and then we can now like ensuring the structure of the walls right I mean book culture environment could we kill 2 birds with 1 stone with the when we're dealing with the debug symbols because 1 of the problem so episode when debugging kind of installed in lots of the decode packages and then I have all of them the symbols or all over the place and I have to just to and this substitutes GDP we could kill that problem at the same time as making reproducible builds up all the different things that why did that it was written by a rare that initially the because the at had the police that so that it is getting at you you thought that of the and so the main goal of the what the are because then you can just extract the 2nd thing to me right the right place for functioning work so I think we should begin to and so it's a cubature you pulled is if you were considered to no 1 but the type the of some signature everywhere non-local due to assure that everything is pretty secure workers who believes you have much having this C over where before meeting of because if you all some of the log number from time to be sure that the bills libraries of this and signature of the Brewer there having no we assume that each of the initial deal was done with the gene well power point fall federal thought beginning well controlled and have the same on lecture at the freshman some sort of your system works it's everything on your local system is made using 2 reviews were approaches will build snow and we found out that it is not affected and I think Alice is the understood the Turing is being sure that some of the requests almost everywhere the word of by the residual from but which will to boost their properties they like the real system for the whole binary for each of us
but this is below the roots of libraries would have to be sure the pools and bizarre of tools and signature religion which are which are in short but trust in the system for you sure but it's an integral part of the the when we when we talk about security component you could be like so it's like important you have in mind the costs then it attracts 0 right this is for it is 1 that that these are the cost of providing 1 system that into the local system is less than the cost of primates in the millions of years and it's the same life compromising on our more than the package is a problem it is it's disappearing because of the war recognition of irony but we really I mean it's a greater role for each other every packet that gets reports all the all the something out of it provides phonotactic everyone of with it so that it is we're going to get into sequel to predict how can you cross to component that that's also a different that stress try and some people argue that I'm going to get into the cell and it was a teacher I think this is the whole which I think is part of the and on involving you're building you know and then the protein system but you so during the build process and we actually have a job out situations looking but all signed by a private in that all and bundled into a whole system image and my whole problem is how can I bring repressible builds if I cannot show my private key so what's the solution to that many of their ideal you probably and you have an income there is a there right people reproduce everything example that but you have to do that you able to revise that we only have that that's all you want but run thinking it is you know something I don't think you could work but that they know what you have to use special tools that you on so I wondered if you've done any analysis of what proportion of an actual installed system you how far could
you get if you only choose to install reproducible packages all we need we know India being able to have a like a built-in contains only reproducible packages that I thought a logs at midnight yesterday right so this is the yet but there is nothing questions about which you have magic that is almost you are right at the query that would take a lot of baggage and you have a lot of OK because of all of the building that would be the i cover virus relativistic effects that would offer you have already tried to the tomb embedded systems like arm or something like this as the final there are many problems and CPU so stocks and you won't sold to different pilots on different machines was the same source in the same room in same environment so we build on a successful sure I don't know original archaic going to work I mean we have we have like a lot of which I have a lot of people the traffic so I hope we will all have a right to do so it at the and planning how 1 day to have been pretty cations and process that to build very excited that they can do is sit down and think it is present published in any correspond to with the suspect and his so something we can discuss ingredients for a minute things in the are binary this the idea of binary of the when added in the upper uploaded package Indian will both the source here and the battery for that picture picture of you know our problem is that sometimes that full e and a new not reuse of another the alignment so they have a battery pack is that no 1 can reveal an articulatory personal life with my and identical revealed that reveal from some of the readings invites you to prevent that offers to
uploading binary back packages that all I just think that schools and then then have building and the battery but people you that if we do that people are not going to stand that it is at all the which you know what I want to do we this is that a lot of would not have flowed divided it it would just to read the text of the not that of being you know the changes and then the building will reveal any changes that it matches and then we have to trust that at the you know the the the right the heading compromise all the what the system of and 1 of the that is that's all i so that was but that's not the question but it and and so uh how reading these small structures but be and um do you have a rough estimate when event c use boots but photo books so other any barriers so an equal attributes independence from I have quite pursuing discussion we didn't know was about the use of a car I'm not sure is interested in you this at all so I have the things we know really pressure and to you don't think it was a variety of what I think is a gradient that we can do about that is on the theory that I find that and then like OK so that the thing is is modeled with someone and you're probably you have patches that do something this year devoted to influence quality you right this year I do it there is another language I tried to attract 1 divided ultimately these 2 hot but getting what we think that it part of the reason she wants the area in which you go we keep we have U that's actually in the older than that of in reported so far of the reversible this project and every single I think Atlanta I said you can you think that that is going to intervene in the political life and that you that the rejecting suffice that by problem with like this but yeah so I mean yeah yeah so he goes to cool cool or ask you questions in order you remove the half life gets the so can we have we 1 particular reproducible problem that I'm not entirely sure is solvable because of what we have is we have when we build our like images or a ISO images these are coming from previous prebuilt packages really the only thing being done as being turned into an ISO image and 1 of the things that we've identified every once a while to always pushing on the edges of what you can fit on a disk of we discovered that this study a drive geometry problem we here we're actually turn creating the iso image the same exact set of packages with the with the the timestamp freaked out was still sometimes producer-price package is 1 sector of larger because of because of the because the because of the way the drive the the other sectors on the disk along a line up so how do you work around them to do that we actually have to find like out if you going to be a collective all over India have I mean we have the timing yeah there is a very off something at the end of the year I mean around and Hey and what's the stadium release you focusing your efforts on this at the general and if they don't what the not having the the what where the process of getting things like that until the use you get at me and it is that you can't use the system you want and then you put a change to the policy but you won't be changing probably changes in policies before you get like that and then identify the when names the it going to the park and then also the the usual what I'm going to have to get political reproduce all of these on a large scale before it used to be like the full time is not I think raising during dual like yeah this small offices and you dont and problem in our people in the world unless someone working full on this for a year and what we and some of the question how do use toward the end of the build environment settings and you said that you you wanted to restore them either in the source like a under so I had like few so called in the about lot which is the only thing of binary packages and that it is with the text of this is that description of but we're going taking the size of I mean and as deviant people will know where you talk about this kind of thing in that about people from other distros with is that like a mating is so something that they should sign up to this wiki page that that Semel man we have a reproducibility domains that value our societies that in the very top of the reef debates about the happy to see more people from
other distros use that and we extend idea of problem it is but I don't have yeah a there but if we had not like discussions here maybe we have something always of having them at the end of the very low traffic and that I think it's OK if I want to be on the right you can considering that you can't hear that they're considering you can see who hold everything reviewed reproducible like for example non-free will live you like a section or attack the you can make cannot believe what we can without really going to be fully reproducible from bombing victim irreversible harmful to get all the political change a cave but
assuming there will be a option packages which can be you are reduced to Southern states will there be a system that you can choose a system just gets packages you Rican Bouchard repeat usable where created that part and we can use things like that but I hope that we get everything I don't the reason that we call come come to that because go viral told takes much longer to get everything reviews to do and in the meantime you could there have a system which is already know on the has this uh that objectives picture I mean it's only the beginning if you want you to come back the end of the tunnel all right with that's just it suggestion to the other guy and you could have a little reproducible archive like the Montreal at some point of contention it's going here and it's a lot of it I it is easier to Christian upscale changes in a small fraction and discussing registered on the heart and to prove that here we can as people just to prove to you moved to we're both for them assume that those 2 children will we need 1 experiment the civil public question is the people experience or Europe afraid of discovering a new item factors are adjusted to tune into good concerned because you made some assumptions is brought from but we can you can also print from elsewhere the platform we use is as of the this on the way way that the package is not permitted to that the movement so in that's the part that is formed by you want of people to make extant kind molecules so to build to you know machine directly so I don't see a solution levels that's right for the um you components and I'll have on the way that you could almost all the you have external Middleton's economics or something like that that you will that the host machine you can protect level right but you can put the source code at not been set for being the 1st to the
and I already something about 5 more minutes so
probably about 2 maybe 3 more questions depending on how efficient the questions are of the uh so you said you could do you do with the fact that so if you put the that the rubric to produce you it's maybe some people we start using it then 1 to contributing to fix the less likely to depend on and you would get the more
the real president the forward up well we're not doing well on that day I mean we have patches so again the question
about how far we are like a that is a very real because those the the my situation right here by having result is that I want to discuss again we do have kind of at the beginning of lot of what is the problem that you get into the community you can be we have a disagreement is how want to run it by that be anything that can be said that we can't have this factored into the air and that the digital will have that like this then I'm going to stop before it I mean you can't go to the against that we all have our but the idea now is that to get more people involved like hold the reports they need to be thought of as they call the all of other than that actually that can already be written and submitted and for me I think to be accepted for mental like the sheeple want you have gone the job by like the jobs for the people want to have found out that if you want to have a lot of people want to have come to the people want to have some other software on account of software analysis of where all of the PhD on all the online right now reproducible and it should be the 1 that you know that we can do about whether or not and I think you also updated every once in the face of the community the media not pass the test that they want to do and into the and we expect any specific problems in respect to the kernel prejudice were just another package so the good thing is that some people quite external to the distribution of work in trying to right there will be a goal that have particle manifold and begin walking on reproducing the tunnel and I have a few patches all can already part of of will be ulterior which demonstrated that the that that that been had that you know what that yet but I have negatives that contain our didn't sold just security and that can be reproduced in the environment 1 more forward on integrated what I think are the keys consultations interesting in this out the for all project called the optimal with the proof for the next few days Due to the remote and