We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A standard BOM for Siemens

00:00

Formal Metadata

Title
A standard BOM for Siemens
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Multiple teams at Siemens evaluated several SBOM formats available on the internet. All of them decided to use CycloneDX with some custom extensions. This talk is about the BOM format itself, why we decided to use CycloneDX as a base, and what the goals for this BOM are.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
World Wide Web ConsortiumCASE <Informatik>Software developerDisk read-and-write headSpacetimeWave packetResultantMultiplication signPoint (geometry)Factory (trading post)SoftwareStandard deviationProcess (computing)WordGoodness of fitComputer hardwareFile formatFocus (optics)Right angleForm (programming)Order (biology)DigitizingSlide ruleLecture/Conference
MKS system of unitsSource codeComponent-based software engineeringSoftware developerConnectivity (graph theory)Game controllerSoftwareCASE <Informatik>Different (Kate Ryan album)Process (computing)Execution unitFile formatLink (knot theory)Projective planeVulnerability (computing)Product (business)Point (geometry)Inclusion mapMereologyBuildingCorrespondence (mathematics)InformationMetadataSoftware testingPhysical systemSoftware frameworkOpen sourceSingle-precision floating-point formatSource codeNumberComputer hardwareField (computer science)Regulator geneShared memorySurfaceWeb pageElectric generatorSet (mathematics)
OvalSystem identificationWeightComputer-generated imageryContent (media)Module (mathematics)Content (media)Software developerSource codeMereologyCASE <Informatik>Computer hardwareMultilaterationGoodness of fitDegree (graph theory)Product (business)Connectivity (graph theory)Set (mathematics)Term (mathematics)Device driverQuicksortDifferent (Kate Ryan album)BuildingMedical imagingPerspective (visual)Revision controlOpen sourceFile formatLibrary (computing)Spectrum (functional analysis)Kernel (computing)Gastropod shellBitComputer animation
Formal languageWeb pageStandard deviationHand fanProjective planeFile formatDomain nameComputer animation
Application service providerHand fanIndependence (probability theory)Standard deviationSpacetimeFile formatSource codeFormal languageIndependence (probability theory)CASE <Informatik>AdditionCategory of beingSoftwareVulnerability (computing)Statement (computer science)NamespaceStack (abstract data type)Perspective (visual)Numerical taxonomyConnectivity (graph theory)BitSoftware developerMetadataHierarchyDifferent (Kate Ryan album)Set (mathematics)InformationLimit (category theory)Extension (kinesiology)Electronic mailing listRule of inferencePort scannerComputer animation
Product (business)SoftwareCodeCodierung <Programmierung>Local GroupEuclidean vectorFormal languageStandard deviationSource codeCodeSoftware developerJava appletCentralizer and normalizerFitness functionProduct (business)InformationStatement (computer science)Different (Kate Ryan album)Context awarenessPort scanneroutputDebuggerMereologyAnalytic continuationFront and back endsPoint cloudInstance (computer science)Focus (optics)Category of beingValidity (statistics)Operating systemTouchscreenRevision controlData storage deviceComputer animation
Langevin-GleichungCollaborationismWeb pageExecution unitCASE <Informatik>Open sourceMultiplicationBelegleserSequenceProgramming languagePresentation of a groupMereologyFormal languageElectric generatorComputer animation
Product (business)Web pageProcess (computing)CompilerPlug-in (computing)Port scannerMathematical analysisFitness functionInformationField (computer science)Multiplication signData conversionOrder (biology)Self-organizationActive contour modelMetadataFluid staticsCodeBitBuildingLevel (video gaming)BelegleserPoint (geometry)Slide ruleCodeComputer wormSource codeMereologyConnectivity (graph theory)Different (Kate Ryan album)File formatEvent horizonResultantMedical imaging
Program flowchart
Transcript: English(auto-generated)
Let's see empty spaces. Good. Very good. All right. The whole team, we have now a trio. But they will be sharing a single mic, so yeah. Good job. All right.
Thank you very much. Welcome to our talk on standard BOM. We are here to share with you some of our experiences that we've had introducing a common SBOM format at a large company. And we also hope to get into a discussion with you about your experiences and maybe things
that you notice that we've missed or that we should or could do better. So all three of us, I must say, the thing is called standard BOM. That's just our name for a Cyclone DX format. So we are not reinventing the wheel. It's not like we've invented a format or something.
And we're also not selling anything. It's just sharing experience and talking to you. All three of us are from Siemens. So I feel I need to say a few words about the company. Siemens is a technology company. So you can buy small things like a thermostat for your smart building or if you need a whole train or a power plant.
So I mean, a power plant is nice. And also things in between like medical devices, magnetic resonance tomography systems, or if you're equipping your factory, then you can buy a factory equipment. So Siemens has also been around for some time. Just recently, we've celebrated the 175th birthday of the company.
So it's changed a couple times over the years. And traditionally, of course, there has always been a focus on hardware. But in recent, well, decades, I could say, software has become increasingly important. So now of the 50K R&D employees,
we have a sizable portion of software developers. I couldn't find out exactly what the portion is, but I'm quite certain it's in the five digits. And growing, certainly. And since there's no, in a company like that, there's no one technology stack, so we're basically using everything, I should say.
And that growing importance of software, of course, leads us directly to software bills of material. You're all aware of the legislation that's upcoming, mostly, in the form of executive order and CRA and so on.
So SBOMs are getting more and more important. And I don't want to explain SBOMs. That's just, you know, you all know the stuff on the slide. I just want to stress one thing, and that's generating an SBOM for a software product is not something that can be done manually. It must be the result of an automated process, okay?
So there's just no way to reliably do that manually. And one of the things that we realized is that an SBOM is always created with a particular use case in mind. Even if you're not thinking of a use case while you're doing it, then you're still implementing
whatever's in your head at that point. So it's always, the concrete SBOM document is always intended for a particular use case. Just to give some examples that we are dealing with, one would be license compliance. So we want to make sure that we follow all the obligations from open source software licenses. That's very important because OSS software
is used extensively at Siemens. We use many components, and we also publish them. So if you go to github.com slash Siemens, then you will find some of them. And if any one of you does that right now, then be sure to also click on the badges on top of the page. They link to other places on github that have Siemens open source software. It's not all consolidated into one.
Anyway, license compliance also requires us to have source code available because that must be scanned. Individual source files might be licensed under a different license than the main project and so on. And that's a particular requirement of that use case. So the SBOM will look different
compared to, for instance, the security vulnerability monitoring use case, also very common. Source code is not so important. It's important for finding the vulnerabilities, but not so much for monitoring them. But you need different metadata, such as CPE information. CPEs are used to look up the vulnerability in the corresponding databases, so that's critical.
And also, you might want to include build tools test frameworks and so on, since they might also be vulnerable. Both of those use cases are internal use cases. So we generate the SBOM for us, use it with our systems and processes,
but we don't share it outside of the company. In the third case, regulatory, that would be, again, another use case where we are required sometimes, due to the new legislation, to publish the SBOM. And then, of course, we must be sure to include certain fields in that SBOM about every component that are required by that regulation.
And we will not normally put much more into the SBOM than we are strictly required to do, because, you know, that's for regulatory purposes, and we don't want to open up an attack surface for, you know, just people who want to bitch about some information being wrong or something.
So that's, I mean, that's just a realistic thing that's going to happen, and you'll see later that this is relevant, you know, those SBOMs being created for different use cases. Because when you're creating an SBOM for your concrete product, you're actually solving something of a puzzle. So you have all kinds of pieces
that must fit together to get the final SBOM. Imagine you're shipping, well, something simple like a front-end container with an Angular application in it. So maybe you have an NPM to ask for dependencies. That's the easy part, because it's under your control.
But then you also have, let's say, an Nginx in the container, which has an SBOM, or consists of some components, and it's in, let's say, a Debian Linux, and that has, I don't know, 100 or so open source components as well.
And, well, sometimes you're lucky, and you work with partners or different, in a company like Siemens, you have all kinds of different business units that produce components and give you SBOMs. Those SBOMs might have all the data that you need, or they might not. Imagine that people only gave you the SBOM because they're required to by the regulators,
the third use case. Then it would probably not be enough. For instance, license information is something that's not even required by the NTIA for a public SBOM. They just want to know what component is it. They don't want to know much metadata. So that's something you need to have to enrich then.
You would probably need to have back-end systems to enrich your SBOMs and arrive at the final SBOM while you're solving this puzzle. So now I've talked a lot about the SBOMs in general, and let's look at some more detail with Alex.
Yeah, thank you. So as we already mentioned, one goal that we have is to take you through the process of how we adopted a common SBOM format within the company, and what some of the challenges and major pain points were that we detected as part of that. So of course, at first, you look at the requirements that you actually have, and usually to do that, you look at the process and the people involved.
That's a good idea, even when you're trying to solve a technical problem. So what we considered initially early on, I mean, you've seen our product portfolio. We do everything from hardware to software as a service, so every team at Siemens is different, which for us immediately meant that there is probably no silver bullet that works for all of them.
So there wasn't going to be a single automation approach that we could push onto people. Instead, we needed to provide an ecosystem. So that was realization number two, right? So we need a common set of tools, but not everybody is going to use every tool. But the goal here was to simplify the actual SBOM generation and allow people
to feed that data, because that's the background that we come from, into our OSS compliance and commercial license compliance tooling, and to enable developers to actually use that as part of their builds. And from the get-go, we were pretty clear on that, either becoming in a source within the company, or potentially also open source.
We will comment on that a bit later. And then of course, you can't always optimize for the edge case. So there will be teams within Siemens that use tools that nobody else apart from them uses. But even then, we wanted to enable them to also use the format by at least having a set of libraries that they could include.
So currently we offer these for Java, Python, and .NET, and that definitely covers a lot of the different teams that we have, and similarly, that is provided as in a source today. Yeah, so one valid question that you can of course ask is why do we care so much about our S-bonds
in the first place, and why do we care about them being accurate? There's more reasons than the two I'm going to talk about, but generally these are the main two for us. So one is security, right? So it's not that long ago, actually less than one and a half years, that log for shell hit, or if you think back to SolarWinds, it's important to actually know the products
that you consume, so the dependencies that your own products have, and for that purpose, an S-bomb is exactly what you need, right? So we want to be able to identify vulnerable components as quickly as possible, so if a zero day hits, it's not necessarily a good idea to start investigating
which product uses a vulnerable component at that point, because then that delays the process, and obviously you can only start with the mitigation once you have the full picture of what you actually need to mitigate. The other part is something that is more of a legal topic, so compliance, license compliance specifically, right? So a failure to comply with license terms
of third-party components is something that can trigger litigation. Litigation is something that is very time-consuming and expensive, and our lawyers would rather do other things. So it's important for us to also make sure that this part doesn't happen, and one thing to also be aware of is, generally speaking,
at least from our experience, the larger the company, the larger the compensation claims that people will sue you over. So if you have a GPL violation, then suddenly we're talking about millions of dollars. And the worst case, which as far as I know is something that is probably a bit specific to German copyright law, so it can actually happen that if a GPL violation, for example, is detected,
you can get slapped with an injunction and you are prohibited from shipping the affected product until that is resolved, which for us, if you imagine that something like that happened with a Linux kernel version where we have a driver with a GPL violation or whatever, for us, that would be a big deal. So it needs to be avoided just from a business perspective
for both scenarios, and then even beyond that, of course, less tangible, but still, both of these things, they will land you on the news and you will not get the good kind of publicity. So they are actually a PR nightmare, and that's why we want to get them right. We want to be good citizens. Our bombs need to be accurate. Yeah, another challenge that we detected early on,
because of course, even our embedded hardware colleagues by now, they have figured out that containerization can help them with certain use cases, right? So we also need to make sure that our containers are OSS compliant, and there we have a special challenge in generating accurate S-bombs. So S-bomb creation, which is what this chart here
pretty much shows, it lies on the spectrum. So what developers, of course, like to do is they like to consume public images from Docker Hub or other public sources. That's very low effort for them. They can just pull them. They don't need to create them themselves, but they also don't know what's in it. So you have low effort on the developer's side, but we also have very low certainty.
So creating an accurate S-bomb is insanely difficult, and in some cases, I guess we can actually conclude it's impossible. Yeah, and then the further you move to the left, the more effort is actually involved in building the container, but at the same time, you have increasing certainty about its contents. The pathological case on the other side, of course,
is that you build every image yourself. We use a lot of different images, so maybe you don't want every team to build their own, and so the next best thing that we've arrived at is sort of having these known base images that get shared within the company, or we consume upstream base images that already have an S-bomb that we trust,
which is, of course, a major asterisk there. So you also need to be able to trust the S-bomb. It's not enough for it to be there, and then there, creating those images is much higher effort, but you also have a much higher degree of certainty. So that's something that we realized, and that's something that we try to put in practice. Yeah, so, I mean, these are the challenges, right?
Obviously, the conclusion then was we need the common format to facilitate all of that, and we need to build an ecosystem around it. So that's what we did. We called it Standard Bomb. We have an internal page, landing page, for the format. So if you try to navigate to that right now, it will not do anything for you, but the reason we are showing it
is because that domain pretty much tells you this isn't just a side project that we started. It actually is, yeah, one of the main subdomains within the company. So we already have a lot of teams using it, and yeah, it's growing. So we are picking it up. We are now actively looking into ways we can make some of this available upstream again,
and in fact, we already have, so I contributed the Cyclone DX support to Scancode Toolkit a while ago. But yeah, so we're still figuring some of that out. Yeah, so Tom has already preempted that a bit. What is Standard Bomb? At its core, it's Cyclone DX 1.4. The special caveat is,
so, or maybe I quickly need to explain what Cyclone DX is. So it's an OWASP format, and it prides itself in being lightweight and composable. You can add extensions to it, and so for us, that flexibility was really appealing. One limitation that we already put on it for our internal use, which is probably a bit controversial,
but we did it because we prefer it, we only use the JSON flavor. We don't care about the XML. Once you start dealing with large XML documents, you have to worry about things like, yeah, vulnerabilities in your parser, and we don't want to deal with that. With JSON, they are much rarer. Generally. They're not impossible, but they're rarer. So, of course, using JSON makes it
pretty much programing agnostic, because every language I know of, unless maybe COBOL, has a JSON parser, and probably COBOL does too, I just don't know about it. And then also, the benefit that this flexible format had for us on top of that is, it's independent of the source ecosystem. So we have all these different tech stacks
within the company. They are all supported. There are upstream tools to create bombs. In those cases where those aren't good enough, and up to snuff for what we need, we wrote our own. And another benefit that it has, it's independent of the consumer. But, and there's a caveat here, right? So it's important to keep in mind,
even though they are independent of the consumer, as Thomas already mentioned, usually you create it with a special use case in mind. So if you submit an S-bomb for software clearing, maybe you want to also put a statement of intent alongside it to say, yeah, this is mainly for software clearing purposes,
don't use it for vulnerability scanning. Because if it contains references to the source packages, there's actually a high possibility that your actual product, because the binary doesn't have the source, isn't affected. So that's a statement of intent that we support through something that we call profiles. So that's metadata in the bomb. And yeah, that was also a valuable addition
from our perspective. Yeah, so that's pretty much what I have to say about it. And now to get into the nitty-gritty details, I will hand over to Thomas. Thanks. So, well, we're using Cyclone DX, so do we do something special? A little bit.
But at the end, we still use Cyclone DX. So every of our standard bombs is 100% Cyclone DX bomb. And this is that we really like to emphasize. And because we are consumers, so we heard in the morning a lot of people create S-bombs. So on one side, we create also S-bombs,
on the other side, that we are also the consumers. So we need to ensure that we understand all the information, whoever created it. So we just needed some additional set of rules or of guidelines. So for example, we decided that we want to have the component as a flat list. We don't want the hierarchical structure in the Cyclone DX S-bomb as it is,
but we still have the dependency information because it's just at another place. Another thing is that we find out we need some additional properties. And well, if you tell your developers, just add something, they will add it anywhere under any name.
So Cyclone DX offers properties, these are just the Q values. So we talked to the Cyclone DX guys and they said, okay, you could reserve a namespace. So this is what we did, we provided a taxonomy and now we have the Siemens column, whatever, to clearly describe, this is one of our properties. So this is maybe something that our developers should use.
The next thing, because the three of us come from the license compliance side, is that we require the source code. We require the source code because this is what we scan for licenses, for IPR issues, export counts, all those things, whatever. So we need to find a way to express
where can we find the source code. So it could be a local file, it could be the upstream location, but we have a way to describe it in Cyclone DX. And then the next thing is that the best case would be if the development teams pick all of this together.
So the source code, maybe also the binaries, and the SBOM, and this is then something that they ship to our backends. And then we have all the information that we need. So just to give you an overview, I know it's more on the screen. So you see a lot of standard Cyclone DX properties.
You also see the license, for example. And, but we have some other information. We have the source code, we have the information about the website, which is still standard Cyclone DX, but we also want to know, okay, is it the direct dependency or not? Sometimes we need this, sometimes not.
We would like to know what kind of a language is this. We add, if we find such information, or then the first thing, scan something about third-party notices or copyright statements. Just a short example how this would look. Now, maybe for a better understanding, again, what do we do when we talk about this SBOM?
We use it as an input. So we have the developers. The developers commit their code. Many of them do it to a central GitLab instance that we have, which is called code.xemons.com. And there they run their continuous integration, continuous deployment runs. So this is where our tools kick in.
They, as part of CI, they use scanners, either from us or from Cyclone DX or created by themselves to create at least the first version of an SBOM. And if the SBOM maybe does not contain all the information, then we have additional tools,
maybe to find source code or to guess where the source code might be, to download the source code, or if we have different kinds of ecosystems, to merge SBOMs, because let's say you have a container, you have maybe a scan for the front end with NPM components, for Java components, for .NET, and the underlying operating system.
So we want to combine it to one big aspect. And yeah, we also have some kind of validator. And then this is something that can get forwarded to our back ends. So we have different kinds of back ends, but one of them that you already have talked about is SW260 and with a scanner pathology.
So again, we use this information, store it, let's say, in SW260, and then someone else pulls it out of SW260, does a scan with Fosology to determine what the licenses are, what the copyrights are. So the detailed information, where this information is found,
we don't need it here. We need it down here. But then it's created by Fosology, and probably it's SPDX. It might not be necessarily Cylindex. But again, our focus is here on the input. What Siemens then does, we have a look at the single component, determine the licenses, the copyrights, the obligations,
as we also heard from the French colleagues. Our legal team has a look at it. And then at the last step, we do something what we call product clearing. That is, we take a look at all the components, all the licenses, all the obligations in the context of that very product. And then we do a final check if everything fits.
Because you may know that if we have an embedded product, there may be another situation than if we have a cloud backend or a cloud frontend. Now, this is maybe the way that it took to get to a good S-bomb.
So we think it's not an easy way. And we are not yet done. We shared our experiences, our opinions, our approach on what we did or what we would like to do. And now really, we are here also to hear your comments on that. So parts of the things have been upstreamed,
are available as open source, but is this a use case that you would also be interested in? Is it something where you would say, we should do more open sourcing of our tools? And then the interesting question is, well, if there is already a Cyclone DX Gradle scanner or PyP scanner, do we want to have another one
or should we find a way to combine it? It's up to what the open source community would like to have. So I guess we have five minutes left. On one side, you see the key takeaways
from our presentation. I don't want to go to all of them again because maybe you have questions. There's a question from the chat right now. Question, how do you generate and track SBOMs for multiple language projects? On your introductory slide, you will mention lots of programming languages
being used in sequence. Yes, so they are separate scanners to create. Ah, the question is, oops, I did it. The question is, how do we generate SBOMs for multiple languages or for multiple ecosystems? Yes, we have different scanners, so on, no, on that.
So here, some of the scanners that we created by our own. Now, if we don't have a matching scanner, we tell the people, look for Cyclone DX scanners. If there is no scanner like that, then well, these are developers, they can do it by themself. And then you merge the results?
Yes, yes. Yeah, the small, major parties. Yes, yes. At the end, it depends on the use case, whether we process them separately or not, but we have the way to merge them. More? Yeah, just a quick comment on that. Can you, yeah, thank you. So we merged because what we found is,
at build time, these separate build tools, so whether it's Gradle or the Go compiler or whatever, they have a lot more information than just doing static analysis with some other tool like scan codes. So occasionally, for specific use cases, we prefer to go through build plugins that have all that build metadata to get the full picture.
And so that's why we actually have that modular approach. Right, Siemens is a big organization. What have you done to your supply chain? Because I think you've got lots of things coming into Siemens. What are you doing with the components that are coming that have SBOMs or don't have SBOMs? Have you changed the way you are reacting with your downstream supply chain?
We hope. Question in. Ah, what do we do with all the suppliers? Do we, just rephrase it, do we hope that they have SBOMs? And the question is, yes, we hope, but we don't expect it. So are you generating SBOMs on, essentially, in Siemens, components?
Yes. Yes, okay. Yes. Yeah. So that is something that is actively being worked on to comply with the executive order and so on, yeah. The one in the back. What has made you choose Cyclone DX over SPDX? Yeah, so the question is, and we anticipated it because we already had that conversation
with Kate at the Fringe event. So why did we choose Cyclone DX over SPDX, right? So it was partly because that's what all of us already knew. So that was the first point of contact. And the other reason is that we got going on it a lot more quickly. So it's lightweight.
You can start at a very low level and then build on top from there. Whereas, so our subjective experience, I'd like to say, it might be different for somebody else, but so the SPDX spec is quite daunting in its depth. And we didn't need all of the features. So understanding the spec fully wasn't in scope for us. Yeah, sure. I would like to add one thing about the SPDX
versus Cyclone DX thing. I mean, Siemens is relatively large and there's lots of different parts in it, right? That's probably as it is in most companies of that size. And we discovered that we had already started with Cyclone DX individually before we came together
as to solve this centrally, right? And then once you discover that on an important question like that, you're already almost aligned, right? Then you don't open that can of worms again to choose the best format. That's kind of a, well, realistic approach. How do you scan containers? How do we scan containers?
Yeah, so I'd like to say that's still very much an ongoing field of research internally, but I can give you, so I do believe that to give you the full picture, we should maybe talk afterwards that won't fit into the QA session, but we have a combined approach there as well. So we use stuff like scan code IO, TURN,
all these other static scanners, SIFT actually to get us started. But then once you start digging deeper, of course, that's not the scope of the tool. It needs to be fast. And then we need to aggregate that. And that's the biggest challenge, of course. So reconciling all those different scan results. And so if somebody is doing active work on that, I'd be happy to talk to you.
Thomas, Thomas, maybe one last question. How do you make sure that that's not in the profile? You say I have code, I have dependency scan. How do you make sure that what actually the dependency scan actually matches the code? Yeah, so the question was how do we make sure that the dependency scan is complete?
Well, I mean, it would be snake oil to say that we can always be sure because we're not going to be sure, but we have a best effort approach that has been tested against lots of images. And occasionally, people will actually come in after the fact and verify the results. And based on those findings, we will improve.
And that's one other aspect to that. And we kind of mentioned it on the containers slide when we were at that point. It depends a little bit on what you're scanning, right? So if it's in your source ecosystem, then I can, as a developer, I can be reasonably sure that the SBOM is complete. If I take a random container from the internet,
then that's very difficult. Thank you very much, that's all the time. Thank you. Thank you.
Thank you.