Best practices for securely consuming open source in Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69408 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Successive over-relaxationOptical character recognitionOpen sourcePresentation of a groupSlide ruleSoftwareChainVulnerability (computing)Vector spaceBootingNumberSource codeRepository (publishing)Exploit (computer security)Cross-site scriptingMalwareSpywareComputing platformBackdoor (computing)Open setSoftwareFlow separationSingle-precision floating-point formatProjective planeStrategy gameRepository (publishing)CodeStructural loadExploit (computer security)MereologyNumberType theoryProcess (computing)Vector spaceMetadataPhysical systemDistributed computingVulnerability (computing)Different (Kate Ryan album)MalwareIncidence algebraInformation securityElectronic mailing listData managementChainHydraulic jumpOpen sourceSimilarity (geometry)Centralizer and normalizerRegulator geneInformationGame controllerEnterprise architectureDistribution (mathematics)Software frameworkSoftware developerElectronic signatureVector potentialCASE <Informatik>Software maintenanceProduct (business)MathematicsMachine visionMobile appTheory of relativityLibrary (computing)Position operatorCybersexWebsitePoint cloudComputer-aided designInstance (computer science)Goodness of fitDiagramComplete metric spaceVirtual machineGastropod shellReverse engineeringTheory of everythingMetra potential methodCondition numberProper mapComputer animation
09:04
Open sourceSystem of linear equationsDesign of experimentsDependent and independent variablesGoogolAuthorizationExplosionAuthenticationDeterministic finite automatonData storage deviceInformation securityMalwareTrigonometric functionsComputer hardwareIncidence algebraInformation securityDependent and independent variablesRepository (publishing)Different (Kate Ryan album)MalwareInsertion lossSoftware maintenanceGroup actionJava appletType theoryVulnerability (computing)Vector spaceGastropod shellChainOpen sourceNumberSoftwareAuthenticationInformationProjective planeControl flowVideo gameOpen setElectronic mailing listStreaming mediaEvent horizonRemote procedure callCodeStandard deviationGame controllerSelf-organizationInternet service providerPasswordPhysical systemInternet forumKey (cryptography)System administratorIntegrated development environmentToken ringQuicksortClassical physicsOrder (biology)Normal (geometry)PlanningMetadataFile formatMaterialization (paranormal)BuildingCybersexLibrary (computing)Structural loadBlogSoftware developerLattice (order)Data storage deviceSoftware frameworkWave packetComputer animation
18:08
Dependent and independent variablesInformation securityComputer hardwareGoogolSurfaceDesign of experimentsSoftwareOperations support systemChainCybersexOrder (biology)Vector spaceExploit (computer security)Process (computing)Computer programFocus (optics)Open sourceCollaborationismTrailSystem identificationVulnerability (computing)Streamlines, streaklines, and pathlinesRouter (computing)Software frameworkLevel (video gaming)Data managementPhysical systemSelf-organizationControl flowStandard deviationPort scannerMalwareFormal verificationVector potentialMathematical analysisCodeExtreme programmingPersonal digital assistantScale (map)Reduction of orderImplementationInformation securityLevel (video gaming)Data managementChainSoftware frameworkOrder (biology)Standard deviationSoftware developerSound effectSelf-organizationGoodness of fitMereologyPhysical systemForcing (mathematics)Uniform resource locatorVulnerability (computing)Focus (optics)Execution unitProjective planeCodeMalwarePort scannerProcess (computing)Electronic mailing listFile formatRepository (publishing)Electronic program guideRepetitionComputer file1 (number)SoftwarePlanningBasis <Mathematik>Regulator geneOpen sourceComputer hardwareRevision controlDirection (geometry)Hydraulic jumpComputer programmingLine (geometry)Extension (kinesiology)Exploit (computer security)CybersexScaling (geometry)Reduction of orderService (economics)Product (business)Windows RegistryCASE <Informatik>Dependent and independent variablesIncidence algebraPrice indexImplementationNational Institute of Standards and TechnologyDivisorSurfaceRight angleSign (mathematics)MeasurementMultiplication signComputer animation
27:12
Inclusion mapGroup actionGoogolComputer fileRevision controlCompilerData managementSoftwareControl flowOperations support systemInformation securityProxy serverWindows RegistryCache (computing)Vulnerability (computing)Point cloudEnterprise architecturePower (physics)Self-organizationRule of inferenceAuthenticationAuthorizationToken ringIntegrated development environmentIdentity managementStandard deviationTerm (mathematics)ChainOpen sourceAutomationFormal verificationSign (mathematics)Patch (Unix)Information securityPersonal identification numberData managementComputer fileRevision controlPort scannerMereologyGroup actionPoint (geometry)Regulator geneOpen sourceSoftware maintenanceLevel (video gaming)InformationNavigationRepository (publishing)Sheaf (mathematics)LoginType theoryTrailOrder (biology)Different (Kate Ryan album)Sign (mathematics)Configuration spaceVulnerability (computing)ImplementationGame controllerProjective planePlastikkarteUniform resource locatorLine (geometry)Block (periodic table)Condition numberAxiom of choiceSoftware repositoryMetra potential methodQuicksortInstallation artData storage deviceComputer animation
32:43
Computer animationLecture/Conference
Transcript: English(auto-generated)
00:04
So, who am I? I'm Ciara Carey, I work in Solution Engineering in CloudSmith. It's a cloud-native artifact monitoring platform. I worked in Developer Relations in CloudSmith before that, so I only recently moved to Solution Engineering.
00:20
And I worked as a developer for over 10 years in security apps, printing apps and vision system products. And I got into the whole world of supply chain security when I joined CloudSmith three years ago. It's because your artifact repository are so important to your software supply chain.
00:42
We host build artifact, they deal with all those signatures and they pull in dependencies from public repositories like PyPy. And so, and they also have a lot of information about how the artifacts were built. Yeah, and so this is how I got on the topic of researching supply chain security.
01:02
So today we're going to talk about the importance of securing open source, common supply chain attack vectors, regulations that are coming soon from the EU in particular, strategies to mitigate open source. And we're going to talk about a framework
01:21
for security consuming open source, it's S2C2F. And then I'll talk about Tulene to help you implement that. So there's a lot, sorry about this. It'll be a skip, hop and a jump. So you often see this diagram, it's from the Salsa website,
01:41
it's another framework on the software supply chain. When we talk about the software supply chain, it's always in the same book as securing your open source. And that's because open source is such a major part of your software supply chain. But your software supply chain in general
02:00
is like all the processes, all the tools and code and packages and dependencies that go into developing code. And of course, a huge part of that is your open source dependencies. Over 80% of software contains open source.
02:21
I think there's like half a million packages available on PyPy. So a massive part of securing software supply chain requires securing this open source. Open source is so positive. It's like, what are projects like Python, like TensorFlow, Django, or outside of Python,
02:43
Kubernetes, Debian, NGINX. Innovation will be painfully slow. And proprietary software is not necessarily more secure than open source, but attackers can target vulnerabilities in open source as they know they'll have a big impact. That's where they go if you want to get malware
03:02
or you want to attack software, you go where all the software is. It's the open source. So supply chain attacks have been growing exponentially over the last few years. And vulnerabilities and packages are a popular initial vector
03:20
behind many exploits. The European Union Agency for Cybersecurity, NCI, is like the number one threat this year was supply chain compromise of software dependencies. And automation and like the whole vastness of open source is making attacks and open source code
03:42
with prostitutes harder to fight. So the big, I'm going to mention a few big attacks and then go into the different types of attacks. So the big one was actually it wasn't an open source attack, but it really got people thinking about supply chain security.
04:01
It was SolarWinds. It was discovered in December 2020. It involved malware inserted into the build process of some Aurora software. It was a networking software that loads of big companies used, and like around 18,000 customers,
04:20
including major corporations and government agencies, installed the compromised software. This incident showed how a single breach kind of widespread severe consequences outside of Python, the XE backdoor was very recent. It's a wildly used open source library.
04:43
And it was designed, it was basically a maintainer joined a team, and actually they used a lot of like, they kind of pressurized the single maintainer to be added to the list of maintainers.
05:00
And they put in a backdoor. Luckily it was caught, like amazingly, it was caught by a developer in Microsoft. He noticed that there was a slight delay in what he expected XE to complete. And he looked at the last few commits
05:22
and found what was actually a backdoor, but luckily it didn't get into the major Linux distributions. Also, there's been attacks on PyPI recently, they had to fend off a supply chain attack and they had to stop new users and projects being added.
05:44
And there's been similar attacks on NPM. And yeah, so be careful. So attacks on open source can target how you consume your open source via public repositories like PyPI or Conda or NPM
06:02
or Maven Central. So they can compromise the, also they can compromise the bill process and the distribution process. And most common source of attacks are targeting critical security vulnerabilities and open source.
06:22
So over the years, there's been a few attacks on PyPI. There's been multiple incidents where malicious actors have managed to upload malware. These incidents exposed users to potential malware and security risks. And why do they do this? It's because they target the open source
06:41
because that's the easiest way to run arbitrary code. They're exploiting where the developers are most active. And some of these attacks are, the type of squatting would be a big one. It's where you upload a malicious package that has a very similar name to a popular other package.
07:00
So actual attacks in the past have been your lib3. There was some slight change to the name. And they, I think it was to do with casing. So when it was upper case L, that kind of thing. Also requests versus request.
07:21
It was uploaded to PyPI mimicking the popular requests library. And this package was designed to execute code that would open a reverse shell, giving attackers control over the victim's machine. Also set up tools is another package that had been attacked with a Tiber squatting.
07:45
Another type of attack is dependency confusion. This mostly came to light through a security researcher called Alex Burzen. I'm not sure if it was actually an attack in the wild. So this researcher demonstrated
08:02
a dependency confusion attack. And it mainly targets enterprises that use private repository or artifact management like CloudSmith. So you would push a package to PyPI that you would kind of guess or you would find information about their in-house dependencies.
08:20
You push them to the public repository like PyPI. And you would, when they build the package, you would bring in that package from PyPI because it's earlier in the list of prioritized public repositories. So it wouldn't use the private in-house package.
08:41
It would use the public one. You can mitigate against this using like ordering, that kind of thing. Another type of attack is starjacking. It's like when you copy the metadata of a known good package and you kind of spoof the popularity of a package.
09:01
And really, they're getting malware. They think it's an OK package. Oh, look, these well-known maintainers are on the team. It's got loads of stars, but really, they just copied the metadata. Another type of attack would be, this is sort of the XE attack, you know, where you have container ownership.
09:22
Maybe it's changed hands to a malicious bad actor. In the Python ecosystem, this has actually happened. So event stream incident, this is like a while ago, 2018 or something like that, where it occurred. Oh, this is actually, this is JavaScript. This isn't JavaScript, sorry.
09:41
And it was like with the maintainer changed ownership and there was malicious code inserted. Another one is a Colorama package. It's totally handed control over to attacker who, again, inserted malware. And also, maintainer credentials can be fished or breached.
10:06
And there's been a number of incidents of this in Python. CTX and PH pass packages incident 2018 were compromised when their credentials were stolen
10:20
and to get, again, to steal SSH and GPG keys from user systems. And another incident was, well, just in 2020, there was a big PyPy disclosed that credentials for thousands of accounts were potentially exposed due to a third-party breach.
10:41
Although the breach did not directly involve PyPy's infrastructure, it led to recommended for affected users to reset their passwords. And outside of that, there's been DDoS attacks on PyPy itself in July 2020. And so that affects your life. You're expecting your package to be available on PyPy
11:01
and your bill breaks because PyPy is down. That's another kind of type of attack. I love talking about those attacks, but really, like, probably the most the usual way that bad actors get into your supply chain
11:24
is with vulnerabilities in open source. They're a major threat. In log for shell with the big one that happened in the Java environment and then also in open SSF. Heartbleed in 2014 was a huge one.
11:40
It was like a CVE score of 10. So, and they allowed remote code execution. And actually that open, you might have thought, oh, open SSL probably won't be affected by Python package, but actually it did affect Python. For example, your lib3, I think was affected by it
12:00
and they actually had to push a patch to correct it because it relied on open SSL. So vulnerabilities have a really long tail. Like for example, that open SSF package was like, that Heartbleed was like 10 years after the patch was released and fixed.
12:24
Attackers were still, you could still see that the vulnerable open SSL was available, was still being used and used as a conduit for attacks. So the reason why these vulnerabilities are still attackable even after a patch is fixed
12:44
is because patching vulnerabilities and open source dependencies might not be a priority for your development teams. And often people don't know what open source they're consuming. Maybe they're not consuming the dependency directly,
13:00
but it's a transitive dependency, a dependency of a dependency. And so they're not even aware that they're affected. But don't worry, we're not going to sit on our laurels. There's been a response. There's been a response from the industry, from governments, and that included,
13:20
I was really impressed by the response by the US government. After the SolarWinds attacks, President Biden's administration issued an executive order to enhance cybersecurity for the software supply chain. He had a mandate for organization selling to the federal government to provide software bill of materials.
13:41
It's like an ingredient list in a standard format, S-bomb. And they put a lot of work into defining the S-bomb with the NTIA. And after the log for shell vulnerability, this was a vulnerability, it was a 10 out of 10 remote code execution that affected Java.
14:01
They convened all the stakeholders from open source, from big tech, to create a 10-point plan to bolster the security of open source software. They also put a lot of money behind it, I think it's like 150 million over two years, to push the widespread adoption of S-bombs and improve tooling and training,
14:22
and understand what critical systems are affected. And I liked how they didn't turn on open source, they didn't go, oh no, open source is bad, everybody just don't use it. They understood that actually open source was integral to developing software,
14:41
and that to put their resources into bolstering it and bolstering the packages that are particularly important to critical systems. OpenSSF was integral to this response, it's like a cross-party forum for a collaborative effort to improve security in open source software.
15:01
You should join, they have a few, they have a Slack channel, they have a lot of meetings, a lot of working groups. If you're interested, you should take a look at it, I've gotten lots of information from it, it's a really active community, different working groups for repositories, for Southstar, different frameworks, for 6Dore, it's really great.
15:24
And also from PyPy, they've been really, I've really admired how they've responded to it. They've increased their security, they have included SPUMS in their formats,
15:41
they're trying to integrate with 6Dore, they've included, they've improved authentication from their maintainers with 2FA, with trusted publishing. I've been really impressed with Dustin Ingram's response, he's so, I just, I haven't talked to him directly,
16:03
I was in some of the working groups, I was listening, I'm more like a listener for OpenSSF working groups, rather than, but how he's so, he's so empathic with the community and trying to work on the most impactful project.
16:26
So I've really been impressed with their response. So they've, if you look at their blog, under security you can see all the stuff that they've been doing over the last few years. One of the things I'm going to put out
16:41
is that they've introduced trusted publishing recently, that's OIDC, OpenID Connect, so that maintainers don't need to have their credentials stored in their CICD. Normally when you, like, this is like a classic avenue of attack where you try to get the API keys, maybe they were stolen,
17:02
maybe they were exposed in GitHub history or something like this, so by using ephemeral tokens for OIDC you're completely removing that attack vector. They've also enforced 2FA for, I think, the top 100.
17:21
I'm not sure what it is. Different public repositories that have adopted this have slightly different ways of doing it, but they've enforced 2FA for, say, like the top packages or the top maintainers. And they've hired a safety and security engineer, I think it's Mike Fieldner, and a support specialist.
17:41
I think that's just recently been filled, so they would be the people responding to, like, I need to reset my 2FA. This takes manpower. You need someone to actually reset stuff. And they're also constantly removing malware from PyPy. And they're a big presence in OpenSSF.
18:07
So there's also been a big response from the EU. There's the EU's Cyber Resilience Act. It formalises good software security practices, such as automatic security fixes
18:21
and standardised vulnerability reporting that companies are expected to do. So it applies to all hardware and software used in the European Union. It's going to come into effect in 2027. That seems like a far way, but I think it's going to come faster than you think.
18:41
Companies have to own the entire product, including the open source that they use. They can't just say, this is my code. They have to incorporate the open source when they consider vulnerabilities in their product. So we don't know exactly what standards we'll have to comply with with the CRA right now,
19:03
but there are existing standards that would probably be a good indication of what is going to come. So we're going to talk about them later. There's also the secure software supply chain consumption framework, STC2F, and there's also the ones like Stalsa supply chain levels for software artefacts
19:21
that probably will give you a good idea of what to expect when organizations are developing software. And this applies to any software that's used in the EU. So you might say, oh, well, I'm a US company, but if your products are used within the EU, it will also affect you.
19:40
So securing your open source software is crucial. The role of open source in your software supply chains is so important. What are we saying now? 90% of your actual code base is open source
20:01
or something. There's always like, it gets higher and higher. The amount of in-house code is smaller and smaller. So it's like an extensive attack surface. The rise of supply chain attacks and their implications, it's just such a popular attack factor behind many exploits. And compliance regulations are coming down the line
20:23
in the EU, or we're talking about the CRA and also the NIST directive. And in the White House, they have other regulations to do with software supply chain security. So how can organizations securely consume open source?
20:40
So a successful program is not about, it starts with the people. It's about dedicated teams that are focused on this and policies that help you automatically consume open source security. So we're talking about automation. We need policies and processes,
21:02
visibility into what you're using. What are your dependencies? What are their dependencies? And we need a way to track your vulnerabilities and prioritize the most important ones. And we'll talk about, so there are a few best practices. I think that Salsa is well-known
21:21
and it mostly focuses on the build process. So your CSED and all that. And we're going to be talking today about S2C2F, the Secure Supply Chain Consumption Framework.
21:41
So it was created by Microsoft, by Adrian DiGlio. It was donated to the OpenSSF. It's basically just a practical guide to knowing what software, what open source you're using, and preventing the consumption of vulnerable packages and an efficient patch management process.
22:01
So it has eight practices that organizations should use and four levels where you get more and more secure as you move up the levels. So the practices are ingest, consume your open source from one location controlled by your organization. So this would be like basically using artifact management
22:20
is a big part of this process. This is why my company is so interested in supply chain security. Inventory, so create a list of all your open source dependencies. So that might be your requirements file, a lock file, and then eventually SBOM will be like that
22:41
in a standardized format. So that will drive innovation. Not innovation drive, automation. Update, so you're updating your known vulnerabilities in a timely manner. Enforce, so force developers
23:00
to adopt secure practices. Audit, audit that developers are consuming through the approved ingestion method. So they're not like, they're using package managers and they're using your artifact manager, your private artifact registry. You're not going directly to the public repositories.
23:22
Scan malware for vulnerabilities using automation where possible. And then rebuilds open source on trusted infrastructure. This is sort of like for the higher levels. I think it's not really realistic for most projects.
23:42
And fixed upstream. Again, this is more higher level practice. So this would be for maybe a critical project, maybe you're in a bank or something like that where you would have the ability to fork and fix your code
24:00
where necessary for a temporary fix but you're always trying to go back to the upstream. So it would only be a temporary thing. So let's talk about level one with this focus on ingestion. So big problem with organizations is that many organizations don't know
24:20
what open source is used or where it's used in your system. So knowing your open source usage is a critical baseline that everyone needs to establish in order to detect and fix issues to reduce supply chain risk. So we're talking about using package managers, using artifact management like CloudSmith,
24:41
but there's others like Nexus. Scan with known vulnerabilities, scan for software licenses, have an inventory of your open source, hopefully use S-BOM's software bill of materials and for level one, you're just required to do manual updates.
25:04
Level two is trying to reduce the mean time to remediation for vulnerabilities. So you would have an incident response plan, you would focus on identifying known vulnerabilities in open source.
25:22
And so you're using tools like scanning and you're using automatic update tools like Dependabot. And for level three, it's malware defense and zero-day detection stuff. So the emphasis turns on providing provenance and proactive security measures.
25:41
So you're proxying, caching your open source internally. You're not going directly to your public repositories. You're doing security reviews and also you're enforcing and verifying signatures and you're enforcing consumption from a curated feed.
26:02
So you might have a blacklist or whitelist. And the advanced threat defense, I kind of just jump over this because some of this seems unrealistic. So the level is considered aspirational in some cases as it's difficult to implement at scale.
26:21
It involves rebuilding open source on trusted infrastructure and also in extreme cases when a critical vulnerability is discovered, organizations will have the capability to provide fixes, a temporary fix that you will always try to be putting it back to the upstream. This will involve working the code, implementing the fix
26:40
and then releasing this version on a temporary basis. And also validating SBOMs of open source consumed, rebuilding open source on trusted infrastructure, signing rebuild open source, generating SBOMs for rebuild. So there's a lot here that might be aspirational
27:01
when you're talking about scale. So we talked about what they expect. So what tooling do you actually need to implement S2C2F? So level one, so you need to check before you include your dependencies.
27:20
So this would help against type of squatting. You're using maybe a scorecard or different tools to see or types of navigator where you can actually check what's the quality of the package. So scorecards is another one from OpenSSF and it kind of uses stuff
27:43
to check how lively the project is, how many maintainers it has. If they respond quickly to PRs or GitHub actions or GitHub issues, do they have MFA enabled and are you accidentally installing the wrong dependency?
28:02
So a big part of level one is that you're using package managers. So in Python, there's just so many. Poetry, PDM, Hatch, and then that's just for PyPy. I don't know, there's more. Conda has its own package manager,
28:21
but package managers increase their security. Package managers search, just downloads, installs, and configure packages from a repository in a defined way and can help you keep track of your packages. But the package monitor story, I was saying it's not so simple for Python because there's so much choice.
28:42
Pip is the one I'd be most familiar with. So another part of level one is to use lock files. So ensure your dependencies are pinned to a specific good known version. So these files should contain strict requirements
29:01
to qualify as lock files. And you can use tools like pip tools command will resolve the dependencies and generate a requirements.txt file that includes both direct and transitive dependencies and all pinned to specific versions. Does anybody use pip tools here? Yeah? Cool.
29:23
So for example, this is a requirements file without, which is on pinned. We don't know what version of Flask it's going to use. We don't know what are the dependencies, transitive dependencies used. And using pip tools, it's generated this file
29:41
from the previous requirements file. Well, it was a requirements.in or something. So you can see here, it's showing the exact version and it's also showing the transitive dependencies and telling you why they're there in the little comment section.
30:05
Another part of level one is to bring all your open source under your control. So this is where your artifact management comes in. So you're ingesting all your packages, including your open source through an artifact management. And this ensures that your central occasion for your open source packages,
30:20
it's a single point to initiate policy enforcement, to conduct scans and to implement security controls on your open source. So, but in order to do that, you're using package managers and you're proxying and caching your public repositories through your artifact management.
30:41
Do many people here in work actually go directly to PyPy or your public repos? Or do... Hello, can you hear me? I'm so sorry for interrupting you, but it's 11.52. I think if you could just wrap up, it will be amazing. Sorry again. Oh, I'm so sorry. Yeah, thank you.
31:00
Okay. Yes, thank you so much. Yeah, no problem. So we're talking about level of scanning, we're talking about S-bombs and then, yeah, we're talking about dependabot using dependabot and then we're for level three, we expect you to enforce policies. So you're saying,
31:21
you're enforcing policies on the licenses you're using, the level of vulnerabilities that are acceptable and then you're blocking packages that fall outside of this. We're also talking about using signing everything. I had a thing on how signing in Python isn't so easy. You know, they've recently dropped PGP
31:40
because it wasn't useful, basically. And it looks like they're going to go towards SiggStore, which NPM has also adopted. It's just... So... And on a... You can see there's a package called SiggStore Python, which seems to be where PyPy is going and Pip is going with signing packages.
32:01
So great, let's just go to the end. OK, so control all your open source with artifact management, policy management, update and pin your... Update as quickly as possible, but pin your dependencies, use automation, use package managers, use lock files and stay informed.
32:22
I like the Open Source Security Podcast with Kurt and Josh and Talk Python has a great episode on this exact topic. So, yes, so that's it. We've talked about the threats, the regulations coming down the line, how to... Best practices and the tooling
32:40
to implement the best practices. Any questions?