We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

SBOM contents for embedded system images

00:00

Formal Metadata

Title
SBOM contents for embedded system images
Subtitle
Open discussion
Alternative Title
Discussion on SBOM contents
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A open moderated discussion on different aspects of SBOMs, especially oriented towards embedded system images. Audience participation is expected and encouraged!
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Physical systemComputer-generated imageryContent (media)Source codePresentation of a groupOpen setSoftwareFocus (optics)DisintegrationMultimediaInformation securityTelecommunicationSatelliteRootkitBuildingSoftware maintenanceConfiguration spaceBootingOpen sourceComponent-based software engineeringElectronic mailing listComputer fileInformationElectric currentComputer engineeringRevision controlUniform resource locatorHash functionLocal ringPatch (Unix)File formatNetwork topologyPoint (geometry)Euclidean vectorModule (mathematics)Modul <Datentyp>DataflowPresentation of a groupConfiguration spaceComputer fileRootkitSource codeDatabaseMultiplication signPoint (geometry)INTEGRALSoftware maintenanceIntrusion detection systemRevision controlInformationOperating systemHash functionFunction (mathematics)MereologyElectronic mailing listPatch (Unix)MappingOpen sourceLevel (video gaming)outputPhysical systemInformation technology consultingContext awarenessBuildingNumberConnectivity (graph theory)2 (number)Vulnerability (computing)TrailFile formatDiagramComputer animation
Euclidean vectorRevision controlPhysical systemModul <Datentyp>Point (geometry)InformationComputer engineeringModule (mathematics)Source codeHierarchyVulnerability (computing)Patch (Unix)MultilaterationCASE <Informatik>Data structureRevision controlGoodness of fitSource codeField (computer science)MetadataLatent heatDistribution (mathematics)Level (video gaming)Computer fileUniqueness quantificationArithmetic meanHash functionInformationBinary codeNetwork topologyProduct (business)outputOnline chatMaxima and minimaSet (mathematics)Flow separationSoftware maintenanceDescriptive statisticsComputer animation
Point (geometry)InformationEuclidean vectorRevision controlComputer engineeringPhysical systemModule (mathematics)Modul <Datentyp>Source codeFunction (mathematics)Formal grammarMathematicsPoint (geometry)InformationRow (database)Multiplication1 (number)Source codeMultiplication signScaling (geometry)Computer fileMereologyCASE <Informatik>File archiverVulnerability (computing)Module (mathematics)Group actionBuildingPhysical systemMetadataKernel (computing)DatabaseExistential quantificationCycle (graph theory)Field (computer science)Projective planeSoftware repositoryLibrary (computing)Computer animation
Revision controlEuclidean vectorPoint (geometry)InformationComputer engineeringPhysical systemModule (mathematics)Modul <Datentyp>Source codeInformationSource codeCASE <Informatik>BuildingElectronic mailing listRevision controlComputer fileWeb crawlerDirectory serviceNumberPhase transitionProcess (computing)DatabaseData storage device2 (number)Patch (Unix)Multiplication signHash functionAngleView (database)Computer animation
Program flowchart
Transcript: English(auto-generated)
Welcome. So to break the flow of presentations a bit, there's now going to be a discussion, so not just a presentation, but a discussion. But I'm still going to give a short presentation
first to create some context. First, who am I? Can you get a microphone first? No, you need to speak louder. I need to speak louder, okay. No worries. So I'm an embedded software architect. This discussion is also mostly focused on the embedded
aspects. I'm working on Linux OS integration as a consultant for dozens of customers. And I'm also a maintainer of the Buildroot project, which has a team of four maintainers,
or five, depending on how you count. And that's actually the context from which I come. I mean, from which I give this presentation. I don't actually care about SBOMs. It's just something that needs to be done. And so, yeah, there we go.
Because maybe not everybody is familiar with it, I'll give a quick overview of what an embedded Linux build system is. So basically, it's taking a lot of sources, some open source sources coming from the internet, some in-house components coming from various ways that you
can get at them. Sometimes these in-house components are going to be binaries as well. And then the embedded build system takes all that together with the configuration and produces a number of artifacts. One thing to note is that the number of artifacts is really small.
So we are talking about maybe five files or something like that. It's not like when you create an operating system that you have all these files that you need to keep track of. So from my point of view, as a maintainer of such an embedded Linux build system, the problem is actually quite simple. We know what the inputs are, we have just a few outputs,
and we can just say, okay, these outputs are generated from these inputs. So that's actually what we do in Buildroot. We don't have SPDX at the moment, we don't have anything complicated. We just have a list of packages with the package name and version,
where it comes from, the source URL, also the tarballs themselves and the hashes for checking the tarball, the patches which are applied to them, the licenses, and the dependencies, so the build dependencies, so what other packages were used to generate
this particular package. And then the assumption is that all of this together goes into your target image, so there's no distinction of what is used for what particular output. There's also a list of files per package which you can kind of use to reconstruct
to get more fine-grained. And then what I think is the actual thing you want to have is the CVE information. So the two things which I think are needed are the licenses, and that's part of this top part, and the vulnerabilities, the CVE information.
So there's a separate tool that extracts that, and it uses CPE IDs to relate our package name and version to what is in the CVE database. Now when you do this,
this is of course not reproducible because it uses CVE information over a certain time, and new CVEs are created all the time, so it's something that you have to rerun all the time.
So as I said, it's very simple. There's a lot of things which are missing, and where my question is basically is this something relevant to work on. So one thing that is missing is external files that you supply yourself is basically all the configuration which I mentioned here, this part. The assumption is that as a user you know what these files are,
you can inject them yourself. Same with the built root source. We could make it terrible with the built root source, but we don't really see the point. Then we come to more important things, that's vendor dependencies. So if we are
SBOMs are used for two purposes basically. One purpose is for license information, and second purpose is for vulnerability tracking. Now if you have a vendor dependency in some package, we just see the top package that vendors it in, and not the actual vendor dependency.
So we don't have the package name and version of that. This used to be not much of a problem because not many people were rendering. But now you have Go stuff, Rust stuff, NPM stuff, which brings in all these dependencies and they're kind of invisible to us.
We also have everything in one file, not spread out over dozens of SPDX files. Is this good, is this bad, I don't know. We don't have SPDX formats. And we have it only at the package level, not at the individual source file level. So
our inputs are basically tarballs, not C files. And as I mentioned before, we don't have mapping of source to target files. So that brings us to the discussion points. For me the most important thing to discuss is
why are we doing this or what are the consumers, how is this information going to be used. Because that kind of determines what should be used as input as well. If you look at, for instance, the SPDX specification, it doesn't really say whether you have to
look at a source file or you can treat the tarball as a source. It just says, okay, there is a relationship there. It's not, I'm going to give you the microphone. I'm going to start standing because I suspect I'll be talking.
And I'll come up here with you. So, turned around. There is, sorry, back to the question. You got me confused now. SPDX. Individual source files or tarballs?
Individual source, a tarball is just a file. And you can use SPDX at any level. And so if you just want to look at the package level and say, hey, it's this tarball file, that's fine. That works. You do not need to take it down to the source file. And it's a minimum set of fields that you just basically, all the concepts you had up there,
you should be able to express right now with what we've got in SPDX today without any trouble. And so you basically would put a package there with the metadata that you want to keep at the higher level and you point maybe to a file that has a hash. Simple enough. Yep. So, I see another remark there.
So, the remark question is, if I understood correctly,
so an SBOM is basically a hierarchy of dependencies, but you can flatten it to just have input and output. What I think for an embedded build system, I think it's enough to only have this flat one without a hierarchy because the hierarchy is difficult to determine
for the embedded build system. And I don't think it has useful information unless there is anybody that can say there is actually useful information in the hierarchy.
Yeah, I'm going to try to speak about that a little bit later. But yeah, there's a ton of uses for having a structured SBOM. And if you saw, for example, the Siemens use case where they enrich SBOMs, you need to have that structure to enable to let you know when the enrichment
happens, where the extra information is going to be happening. Also, if you want to compose an SBOM by taking pieces from another one, like my friend Ivana here has been doing a really great work on composing SBOMs. So, if you want to compose an SBOM by taking pieces from
one and moving that data to another one, you need to have that structure. And there are several use cases where you need the structure. But then I wonder what the... I mean, if you compose SBOMs, there is supposedly
also a corresponding composition of the binaries themselves that the SBOMs describe, right? Because in the end, an SBOM is a description of a binary. But the binaries can be repackaged,
for example. You can have a binary product of a field, and then you ship it, and you ship it. So indeed, if you're going to repackage stuff, then this is relevant. I'm surprised that there are use cases for repackaging stuff. There's a question from the chat,
what about handling patches? Good question. So, what we currently have built through it is just patches are one of the sources, and they're described as one of the sources.
Or, well, they're included in the tree as one of the sources. Is there anything else to say about patches? There's also a specific relationship for patches, because it's a modification rather than an actual source.
What about naming? I think there was one more remark about patches. Yes, I was the one in the chat. The thing is, if you have a curl, and you have a curl that you have patches with, it's not the same curl. There could be other vulnerabilities in your
distribution than the original one from GitHub or another one from VST. Yeah, so indeed, it's essential that you track patches rather than, I mean, you have to, you definitely have to record that what you are using is not curl version x,
but curl version x plus patches, and then also which those patches are. That's a new URL, which takes us to the naming problem. Yeah, so if you've got a naming problem, you say OpenSSL, is it OpenSSL?
The OpenSSL, or is it OpenSSL wrapper, in Rust or whatever, or is it an OpenSSL that someone's patched, or modified, or built in a particular way, so many options? So the remark was, if you say this is OpenSSL, or even OpenSSL version x,
that doesn't necessarily uniquely identify it, because it can be patched, or it can be built with certain options, so that information about how it's patched and how it's built has to be recorded as well. So do you capture that as part of build root?
As well, so implicitly, yes, but not explicitly, so it's captured because we have the baseline, which is basically identified by CPE ID, which is the upstream version, let's say, or well, the version as published by the maintainer of the package, and then the patches
are recorded separately. The configuration, as I mentioned before, in build root, we don't really record, that's up to the user themselves to record it, so it's, I mean, there's definitely room for improvement there.
Yeah, this is a problem with like anything that's building from source in this way, is that the name and version aren't unique. This is why we have to have hashes. Right, yeah, exactly. This is the reason we have hashes. You have to check the hash, and that's why you need the build information, because just because it says OpenSSL 1.1.2 doesn't mean that the place your security vulnerability was actually
even compiled into the code, right? Yeah, so the remark was that, what was the remark? The package and version information, when you're talking about... Yeah, package and version information is not enough, are not unique, yeah.
So it might not even be the same between two people that built the same thing, because of the options, right? Yeah, because of configuration, yeah. And the solution for that is hashing... Yeah, you need to hash the outputs. Yeah, hash the outputs, yeah, but then the thing is simply hashing the outputs,
okay, then you have an identifier of something, but okay, you actually have the output, you could hash that, but it doesn't give you any information. Right, that's why you need the build... You need actually the build information itself. Where also, even there, the usefulness is a bit limited, because in the end, it goes
to the CVE database, and in CVE database, you don't have this information anyway. You don't have it in a... Well, you may sometimes have it in an informal way in the description, but you definitely don't have it in a formal way, saying, if configured x, then... So unless there is also some changes there, I don't think there's much use
to... I mean, it's important to record it for manual analysis, but since there is, on the other side, no formal recording of it, I'm not sure if it makes sense to record it formally. So what we do for the CVEs is we don't...
We only put in the... We put the CVE in, and then just which ones we've patched, so that way, if you go look it up, you know, I don't need to worry about this one, but if there's any new ones, that way... Yeah, that's basically what is done in Buildroot as well, but not in an SPDX format, just...
The CPE ID is a field, external reference, you can associate with the package. Yeah, so the remark is in SPDX? In SPDX, there's the external references, and you can associate a CPE with a specific package. You can also associate a Perl with that same package, and if you wanted to put both
of them there, you could. It's flexible there, and because of the time scales of vulnerabilities and so forth, what you want to record at a point in time, whether something's patched or not, or other things like that, this is all hopefully able to be done. The question is, you know, are people having tools that are semantically accessing it right now?
Yeah, so one of the things about the consumer's tools is actually we are seeing tools emerge. In fact, I know of two off the top that are basically consuming S-bombs and matching them to vulnerabilities. So that takes care of the monitoring over time, because there's two different time cycles. There's what's known at Build, and then there's what's known in the field over
time. And so the two projects I'm aware of are Daggerboard, and the other one is the one that's sitting in the SPDX repo that looks up vulnerability. So you basically feed it in an S-bomb in SPDX, and it will go and query the databases
for vulnerabilities. Yeah, SPDX tool. Tool OSV, yeah. And so there are tools out there that are emerging, and I think we'll be seeing more and more in the years. Yeah, maybe as a reaction to that. So my intuitive reaction is, yeah, but we also have a tool that generates this information
already. I mean, as part of build roots, you can just run that tool again five years later, and you get that information. But there's a caveat where I think it's actually useful to have the build information formally recorded, and that's basically the same thing as what archaeologists do.
You don't know what techniques are going to emerge later, and that can be useful then. So if you build something now, you should record all the information now. It can be recorded. The other little add to that is that use case is also very important in the high
assurance world, where you are being asked to attest exactly which vulnerabilities you know about, but that's an audit case or a high assurance case, and so some places they will want to have that. Okay, I would like to move on to a different subject, which I mentioned here.
That is vendor dependencies, because I think from the point of an embedded build system, that's an important thing to solve. We actually have multiple vendor dependencies. I'll first give an intro, and then, yeah, multiple vendor dependencies. So we have some vendor dependencies which are directly included in the source code.
For instance, Tomlip is a good example. Tomlip is a library that is meant to be vinartin, and so, yeah, people just copy it into their source code.
So that's really difficult to trace. Then there is git submodules. You clone a git tree. That is the information you have as a build system. That's the information you have in your build metadata, but then if there are submodules referenced from there, that information is not part of the metadata of the build
system, and then cargo will go NPM modules, obviously, and then there are some cases where the build system itself vendors things in. For instance, in OpenEmbedded, you have the kernel meta, which is kind of vinartin, and which is not, I mean, I don't know if it's taken into account for the SPDX,
but you need to take special action there. That's the important thing. It's not using the normal parts of taking sources. So yeah, my question to the audience is how can we deal with these vendor dependencies?
You had a remark. A question? An additional question. Okay, sorry. This is actually just a problem beyond SBOM generation, because if you're trying to do air gap builds, these are huge problems in general, but that we've encountered.
So ideally, we could download all of these sources and archive them without, and then go tell the tool to go pull them from the archive, which is often difficult, if not impossible. So yeah, it'd be nice if the tool like cargo and go, I think cargo is not too bad,
but go and NPM are pretty bad last we checked for being able to do that kind of thing, and that would make this a lot simpler too. Yeah, so the thing that I want to add to that is that getting the sources is not the difficult part. So if it's just for licensing, that's doable. But for supply chain, you want to know provenance, and that's the hard part.
I think for all these things, unless the source, the upstream itself gives some provenance information, preferably in a formal format, then there is no, I mean it's really hard
for us as a build system to go and look for it. For cargo and go, it's actually doable to do it on our side, because you have a log file which gives the exact information, not in an easily consumable format, but the information is there.
For NPM, of course it's NPM. And the information is there, but you don't have version numbers. You have Git hashes, which are not something that you will be able to check against the
CVE database. And to some extent, that's also the case for cargo and go versions, because from there also they're specified by Git hash. And then they're directly included in sources as usual, hopeless.
Any other? Well, it's just some perspective. So one thing to keep in mind about those is, those dependencies that get rendered in, and you need to capture those in the S1, you can see them in two ways. One is from the dependency list angle, and in that case, just having the version and
name and ideally the hashes of those dependencies is enough. And the other angle is if you're trying to actually inventory all of the files that get rendered in. So it depends on the use case of the S1, you may want to capture one or the other
or both. So just for the dependencies sake, it may be depending on the build system, of course. It may be the case that you only need the list of dependencies without the actual file information. Yeah, the thing is the file information is easy to get.
It's the list of dependencies which is difficult. I think picking on what Adolfo just said is a lot of builds dependencies just say latest or not explicit on the view that people say, I want to keep up to date patch-wise because we're told we've got to keep everything patched.
But for making it dependable, it's the worst thing because it's going to change. So yeah, I'm trying to just debate, get debate. But actually that's I think is a lot of the things and people are late, generally lazy. I'll just pull in OpenSSL. I don't care what version.
I'll just pick up the open later because it'll be version X today. It'll be version Y next week. And I'm not bothered about that. I don't want to have a admin. So how do we handle that? Because it's with the growth of ecosystems that basically is what people find very convenient.
Yeah, so I don't think we have an answer to that. I think we can get one last question, the question you had, and then we have to stop, I think. The question would be how would you handle vendor dependencies that come as binaries?
Yeah, the question is how do you handle vendor dependencies that come as binaries? I guess you record what you know. That's the perfect answer. Actually, this is probably the answer in general to any question you record what you know. Actually, you want to record it in as follow way as possible because like for this
cargo go dependencies, actually our source store has the log file which has the exact information of the dependencies. It's just not something that you want to go in, crawl through afterwards to reconstruct it.
There's more remark to the offline build thing.
To solve the offline build problem, what you can do is cut the build in two phases. The first phase where you just do downloads, and then you have a download directory which you expose to the second phase where you do the actual processing. And there are two problems with that which I don't think are solved in either Yocto
or Buildroot. The first one is that you actually, to do the downloads, you need some tools which you have to build. You need to do the cargo downloads. You have to build cargo first. But that means that you don't completely separate your builds.
You can't completely separate your builds in your download step because you need to build something to download something. And the second issue, I forgot what the second issue was.
You need your tools to support that. So you can maybe do something like a download step, a build step, a download step, a build step, but it's getting complicated. We're out of time. Yeah, we're out of time. Thank you.