Meditations on First Deployment: A Practical Guide to Responsible Development
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/49914 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202089 / 130
2
4
7
8
13
16
21
23
25
26
27
30
33
36
39
46
50
53
54
56
60
61
62
65
68
73
82
85
86
95
100
101
102
106
108
109
113
118
119
120
125
00:00
Electronic program guideStandard deviationVirtual machineMachine learningMultiplication signComputer animationMeeting/Interview
00:50
Electronic program guideDisk read-and-write headSet (mathematics)Electronic program guidePresentation of a groupDependent and independent variablesGreatest elementTwitterAreaCASE <Informatik>Computer animation
01:39
Associative propertyComputer programmingSoftwareAbstractionComputer hardwareProcess (computing)Software developerPublic domainStandard deviationSoftware frameworkOpen sourceDecision theoryFundamental theorem of algebraPhysical systemKettenkomplexRule of inferenceGeometryMassRight angleAlgorithmGoodness of fitType theoryProfil (magazine)Analytic continuationAreaDecision theoryVirtual machinePrototypeData conversionMachine learningDependent and independent variablesPhysical systemCASE <Informatik>Different (Kate Ryan album)SoftwareSoftware frameworkOrder (biology)Basis <Mathematik>Public domainProjective planePerspective (visual)Computer programArithmetic meanCodeBoiling pointPoint (geometry)Self-organizationEntire functionStandard deviationTouch typingQuicksortSet (mathematics)AbstractionPower (physics)TelecommunicationExpert systemData structureRoundness (object)Object (grammar)Level (video gaming)Regulator geneTerm (mathematics)Process (computing)Matrix (mathematics)Table (information)Field (computer science)Rule of inferenceBitOpen sourceContext awarenessInteractive televisionMultiplication signProduct (business)MultiplicationFunctional (mathematics)
11:14
GeometryCodeSoftware frameworkSoftwareEmailMenu (computing)Port scannerSystem programmingStandard deviationSet (mathematics)Open sourceCore dumpRepository (publishing)Standard deviationSlide ruleOrder (biology)Open sourceSoftware developerComponent-based software engineeringPerspective (visual)CodeTape driveSource codeDemosceneRight angleProgramming languageRegulator geneMultiplication signLevel (video gaming)Physical systemProjective planeSoftware frameworkQuicksortCASE <Informatik>Structural equation modelingPrandtl numberInternet service providerAuthorizationInformation securityConnected spacePoint cloudFormal languageDifferent (Kate Ryan album)Online helpPoint (geometry)Reading (process)Thermal conductivityGroup actionGoodness of fitInteractive televisionRule of inferenceDependent and independent variablesSoftwareSet (mathematics)Latent heatFlow separationConservation lawVirtual machineField (computer science)Observational studyBitSimilarity (geometry)MereologyWaveImplementationSelf-organizationOpen setComputer animation
20:49
Line (geometry)SoftwareMassStress (mechanics)InternetdienstMachine learningPoint cloudCodeSystem programmingCone penetration testNP-hardComputer hardwareKettenkomplexComponent-based software engineeringSoftware frameworkTemplate (C++)Model theoryCapability Maturity ModelProcess (computing)ChecklistFlagExpert systemConfluence (abstract rewriting)Execution unitChecklistRight angleQuicksortPhysical systemVirtual machineCodeRevision controlStandard deviationSoftware frameworkSoftwareInformation securityStress (mechanics)WebsiteBuildingInternet service providerLevel (video gaming)Tape driveAreaCybersexElectric generatorPerspective (visual)Public domainSubsetExpert systemStatisticsDifferent (Kate Ryan album)Set (mathematics)Endliche ModelltheorieDisplacement MappingComplex (psychology)Machine learningAbstractionPrototypeStrategy gameActive contour modelOpen sourceIntegrated development environmentSelf-organizationMetric systemFitness functionCognitionKey (cryptography)Multiplication signField (computer science)Information privacyResultantWaveProduct (business)LoginLatent heatData conversionRegulator geneCombinational logicAugmented realityFlagComponent-based software engineeringCASE <Informatik>Observational studyEntire functionDigital rights managementComputer hardwareSimilarity (geometry)Process (computing)Point (geometry)Power (physics)State of matterNichtlineares GleichungssystemMobile appGraph (mathematics)Performance appraisalCommunications protocolGame controllerDistribution (mathematics)PressureVector potentialRepresentation (politics)BenchmarkScaling (geometry)MassInternetworkingGreatest elementComputer animation
30:23
Operations support systemMachine learningSoftware frameworkCASE <Informatik>Performance appraisalCore dumpInformation securityPublic domainExpert systemProcess (computing)AutomationGenderModel theoryGamma functionSample (statistics)PredictionSoftwareCodeStandard deviationCASE <Informatik>Asynchronous Transfer ModeLine (geometry)Product (business)Perspective (visual)Endliche ModelltheorieAlgorithmSet (mathematics)Dependent and independent variablesQuicksortNumberPublic domainFitness functionWave packetLevel (video gaming)Right angleBlack boxPresentation of a groupBuffer overflowKey (cryptography)Field (computer science)Standard deviationAbstractionChemical equationCodeContext awarenessSelf-organizationPredictabilityMeasurementSocial classLoop (music)Process (computing)SoftwareForm (programming)Software developerMachine learningComplete metric spaceVirtual machineExpert systemoutputTask (computing)Multiplication signPerformance appraisalDampingBitPhysical systemMetric systemOpen sourcePattern languageInformation securityDistribution (mathematics)MassRow (database)Stack (abstract data type)Cartesian coordinate systemDifferent (Kate Ryan album)Cycle (graph theory)Cross-correlationElectronic mailing listVideo gameDataflowObject (grammar)MereologyMultiplicationLatent heatComputer animation
39:58
SoftwareCodeStandard deviationElectronic program guideRegulator geneBitCodeZoom lensOpen sourceSoftware testingMusical ensembleData conversionRight angleExpert systemGroup actionPoint (geometry)2 (number)Set (mathematics)System callProcess (computing)Level (video gaming)Online chatUnit testingMultiplication signResultantProjective planePublic domainSoftware bugGoodness of fitMereologyProduct (business)Physical systemWritingDifferent (Kate Ryan album)Computer animation
45:09
MathematicsMeeting/Interview
Transcript: English(auto-generated)
00:06
Now we are coming to the first keynote today, held by Alejandro Rosado. Alejandro is Chief Scientist at the Institute of Ethical AI and Machine Learning. He leads the development of industry standards on machine learning, bias, and does many, many more things.
00:26
I'm sure he will tell you after his keynote. I met Alejandro in Minsk at the PyCon Belarus first time, many, many years ago, actually. A long time ago in the COVID times, it was still a real conference back there, and I'm looking forward to his keynote.
00:46
Thank you very much and welcome, Alejandro. Thank you very much, Martin. It's a pleasure to be here, and I'm really excited to dive into a very important topic. Today we're going to be covering meditations on first deployment, a practical guide to responsible development.
01:06
I'm quite excited because this is actually getting quite a broad light, which is quite important in the current ecosystem. Just for a heads up, my Twitter is on the top right, and on the bottom left I have added throughout the presentation
01:23
a set of XKCD artwork in case people want to distract themselves into another area. So let's dive straight into it. And basically what we're going to be covering is the best practices.
01:41
But just before diving into it a little bit about myself, my name is Alejandro Salcedo. I am engineering director at Seldom Technologies, an open source project that focuses on deployment of machine learning systems. I'm the chief scientist at the Institute for Ethical AI and Machine Learning, where we focus on pretty much highly technical research on standards and best practices for the development of AI systems.
02:06
And I'm the member at large at the ACM. So I'm more than happy to take any questions as we go on the Q&A or on the Discord and as well after the talk itself. So let's talk about programming.
02:21
And I think we all can agree that there is magic in programming. It's one of the few areas where you can wake up with an idea and have a prototype by the end of the day or the weekend. And that is pretty amazing. And, you know, we all know that software is eating the world.
02:41
We are aware of what the wonders of the world are as of today. But we can be sure that the wonders of the world, of the future will be running Python in some way or another and generally software. And what that really is alluding to is that critical infrastructure in itself is now growingly dependent on running software, right?
03:05
Software that we write on a day to day basis. And regardless of how many abstractions we add on the software, there is always going to be impact, which is human, right? At an individual level, at a societal level, it's always going to be human.
03:22
And there is a piece in here. We all know that we have the deadlines we have to meet. We have to meet all of the sprint points and make sure that we develop into towards all the all the product releases. There's a lot of the times this this conversation of urgency versus best practice.
03:40
But it's ultimately the question about how can it be urgency when relevant? And this practice, right, because at the end, the impact of a bad solution can be much, much worse than no solution at all. And right now, we have seen a lot of high profile cases where basically they have been ensuring this is the case.
04:04
So it's very important to make sure that this is taken into consideration. Right. Like the impact of a bad solution can be much worse than no solution at all. And I think from that perspective, it, of course, boils down to with great responsibility, with great power comes great responsibility.
04:23
Right. And the question is, well, where does that responsibility reside? Where about does that actually go? Who is responsible for that? And the question for that is ultimately at multiple levels. Right. There is the individual practitioner. Right. This is us, the developers. This requires technology best practices, making sure that we use the most relevant tools,
04:43
that we have the right competency in the field that we're acting upon that. And we're aware of our professional responsibility to make sure that this is as best as we as we can possibly do as an individual or as a professional in the field. But then there's also the team slash delivery process.
05:00
Right. It's it's ultimately the cross functional interaction between not only the individual, but also the different people in themselves. Right. The people that you work with, the people that you interact with throughout the delivery of projects, throughout the delivery of software. And then there is the broader level, which is the organizational or the departmental responsibility.
05:23
Right. Making sure that the right high level principles are in place, that the governing structure is in place, that the aligned objectives are in place. And the escalation structure is in place for communication. And this is relevant on specifically software projects. Right. Even though we're talking about code, this really reflects into the humans that are really making it possible.
05:44
And one thing to really emphasize is that from this professional responsibility, we can break it down. And I like to break it down as follows. Right. So let's let's take this what I refer to as the ethically empowerment matrix. Right. What this means is from the perspective of ethical where whether the
06:01
individual is ought to do good from the empowerment, whether they know how to. Right. And from that perspective, the question is someone that is ethical and empowered. That's where you want to be. Right. This is this is really where you want to be able to not like want to do good and know how to. Right. Well, then there's a large set of not just individuals, but situations that people find themselves where
06:24
they want to do good, but perhaps not having the right tools end up resulting in undesired outcomes. Right. And we have seen that countless of times where you can't always assume malice when there may be mispractice. Right. And there can be best intention because an individual being ethical doesn't mean that the whole compound is also going to be ethical.
06:46
Right. And I think economies in themselves can show that as they optimize towards things that are not specifically ethics. Right. So from that perspective, this is one thing to remember. The second piece is the lower left, which is, you know, people that are unethical and they are empowered.
07:03
Right. And that's the type of individuals that need to be ensured that they're following best practices through standards, through regulation, through frameworks. And the bottom right unethical, unempowered. I think, you know, we don't have to worry about about that.
07:20
But the key thing about this is that, you know, we've talked about professional responsibility as an individual, but it's also important to make sure that we take into consideration that these challenges go beyond the algorithms and more specifically, large ethical considerations cannot fall on the shoulders of a single developer. And this is very important because in order to solve human problems, we need human solutions.
07:43
Right. Even though they are software, even though their code, they will bring programming expertise, but also domain expertise, policy expertise, cross functional skill sets from various different individuals that need to work together towards making sure that they can ensure. And it is an end to end approach. Right. Of course, you need to make sure that you
08:02
have the right high level principles and guidelines to make to ensure that the organization in itself is aligned. That the team in itself is aligned, that the project is aligned. Then from there, you need to have the right standards, the right industry standards, the right code standards, the right, you know, even regulatory frameworks. And then from there, it doesn't really stop once you actually set the rules.
08:22
Right. Because the key thing is that you can have all the roundtables you want. You can have all the discussions you want. You can set all of the principles you want. And you can agree that discrimination is bad. You can agree that the harm is bad. But if you don't have the underlying infrastructure to ensure that that can be introduced and operated and monitored, then it's going to be useless.
08:45
And we're going to dive into that in a little bit more detail. In terms of terminology, what we just covered, ethics and principles. Right. And often this gets thrown around and they even like have become to a point that there's some hype. But what is the powerful thing about this is its underlying meaning, its underlying foundation that really builds upon why it's useful.
09:05
Right. And ethics themselves is the moral principles that govern a person's behavior or the activity that they're doing. The principles themselves are fundamental truths that you can set to share the foundation of a system of belief or behavior, which is relevant for, again, an activity. And why do as a practitioner, why is this relevant to me? Why do I need to think about these sort of things?
09:28
Because there's already standards, there's already code standards, there's already things that I can take and follow. And the reality is, is that as a developer, you're going to be dealing with new technologies or situations where there may just not be enough examples, especially when there's emerging technology.
09:44
You will not just have a playbook that tells you exactly how to act, especially when you're dealing with the intersection of software and humans. It's going to be important that you are able to have an internal and also organizational or team framework to make decisions, to make hard decisions and to make sure that you have the right touch points to involve the right not only experts,
10:08
but domain insights to be able to make those decisions. Ultimately, it goes beyond the algorithms themselves. Now, the question here, we talked about ethics, but then there's also the question of whose ethics?
10:22
What are we talking about? Like this is not only from the level of my ethics versus your ethics or my approach versus your approach, but this is also from a higher level. Right. In the context of philosophical foundations, there are multiple schools of thought, Western philosophy, Eastern philosophy.
10:43
And from that perspective, there are a lot of differences and nuances when it comes to the underlying concepts of the individual, the meaning of good, righteous, what is righteous, continuity, et cetera, et cetera. One thing that is worth emphasizing is that I'm not mentioning this
11:02
difference on philosophical foundations as something to be able to make assumptions from, like a philosophical foundation of an entire culture doesn't reflect the current geopolitical or political ecosystem. However, this is something that is important to understand because understanding underlying knowledge allows us to reach a higher level of empathy
11:27
that will allow us to come into more powerful and deeper agreements. And this is key when it comes to setting even code standards, because we have to we have to engage with people from all around the world, all different backgrounds, people that will be coming from very different perspectives in order for you to come into an agreement.
11:45
There may even be discussions where both are discussing the same thing, but just seeing it from a different perspective, assuming that they're actually discussing something very different. So it's very important to ensure that there's that understanding from the foundations to make sure that there are higher level alignments.
12:01
And then from that perspective, the good thing is that we have a lot of resources at our disposal where we can actually leverage a large amount of different components. Right. As a practitioner, as an individual, there is the ACM's code of ethic and professional conduct. You know, it is a very, very sensible read.
12:20
You know, if you look at it, you're probably going to say, oh, yeah, OK, well, that's that's it. Right. Following it would make sense, would make a project more efficient. Right. And would make sure that interactions are more. And then the institute, we put together also a set of principles that are specifically focused into the development of machine learning systems. And we're going to dive into a little bit more about that as a case study of how we approach this.
12:43
Right. So that's that's what we're going to be delving into. But this is the key thing. There's resources at our disposal. And you can go deeper. Right. You can go into philosophical foundations of people that spend their entire lives researching this. And it's still it's still useful. You can read Plato's Republic and it's like listening to a podcast today. Right. So from that perspective is being able to ask as a developer, you know, as I know
13:06
that, like all of us, we love our craft and we are continuously looking to to extend it. It's not about extending it by learning new frameworks in your languages, but also extending it, making it more broad in regards to what are the other things that will make me a better contributor to this to this system that is ultimately human.
13:24
Right. So I think that's that's a very important thing. Now, the thing is that also other other situations say, well, how is ethics relevant to business? Right. Maybe it's just going to introduce red tape and make everything inefficient. But the reality is that principles are good for for business. Ethics are good for business, are good for software.
13:41
You read that the principles that are set by the SCM, you know, contribute to society and human well -being, avoid harm, be honest and trustworthy, and then on the right strive to achieve higher quality, maintain high standards. I mean, you can't read this and say, hey, a project or a piece of software is going to end up worse. If not, you know, it's going to end up definitely much better if you actually follow.
14:03
Right. So I think I think that's something that is quite important. Setting the right upfront things are going to help set all of those. And, you know, from that perspective, we touch the higher level. Right. The ethics and the principles. Right. So let's assume we've set the higher pieces that we want to sort of follow or align on a higher level.
14:21
Well, then we go one level deeper. This is the industry or code standards. And I know that there are very, very different things when we're referring to those, which you understand why I actually referred to them in the same slide. What is a standard? Right. A standard is basically a repeatable, harmonized, agreed, undocumented way of doing something. Right. From that perspective, it's just basically writing something down and having people follow that specific way of doing stuff.
14:45
Right. And what are industry standards good for? An example of one is the Wi-Fi standard. Right. Before before there was an actual standard that was set for all of the providers to follow. Everybody was in the wild, wild west that was actually providing their their own way of providing this wireless connectivity.
15:03
And you can imagine that that was probably complete hell. So from in the code perspective, that could be as aligned to the Python language standards. Right. From that perspective, it's a set of individuals that are contributing to setting this. And then you may be thinking, well, why do I want to do what I want to have somebody else telling me what to do.
15:22
But the key thing here is that standards are set by you. Right. And standards are used by you. Interestingly enough, like open source industry standards have been developed in a similar way. Right. People, volunteers tend to gather together with different expertise and contribute to actually putting together and aligning what should be followed to develop a specific practice.
15:45
And that is what then is published as a standard to them. And the cool thing here is that you actually can get involved. Right. There are standards that are being developed on an industry level for code, for security, even for cloud native standards, for ethics and AI, for general project governance, and even for programming languages, et cetera, et cetera.
16:05
And I think you can you can check out some of the ongoing standards that are being led with the IEEE on ethics. You know, the World Wide Wave sort of standards. So you can actually get involved in these sort of things. If you're interested, you know, do jump in. These are working groups that meet voluntarily every Thursday.
16:24
You know, and ultimately try to all do this because they want to improve a specific way of doing so. So this is actually quite cool because it has a lot of the same of similar at least insights that the open source development stuff. And then we're going to go one level deeper. Right. Because we already said the same.
16:43
We already said you can have all the all the principles you want, but if you don't have the underlying foundation to implement and monitor that, they're useless. Right. So from that perspective, we need to all also, I guess, in this audience, we are all aware and we all agree that open source is now the backbone for critical infrastructure that runs highly important parts of our society.
17:04
So it is very important to make sure that this this field is aligned with some of the higher level components and that we can make sure that those implementations allow us to ensure that those those higher level principles are being implemented.
17:20
Right. Because no longer we can have code in one side and then, you know, and I think this has never been the case. But from that perspective, now that runs on higher, more important critical infrastructure is even more important. Right. Like data science now runs on open source. You know, our desktops, you know, running Linux and that has been for a long time.
17:41
So it is critically important now and growing. You're going to be important. And then that sets the scene of open source, what I refer to as open source as policy. And the reason why I said open source policy is because right now, actually, if you have a look at some of the data or technology related regulations like GDPR in Europe, for example, those themselves are regulations are actually dictating some of the
18:06
requirements of how data should be stored, how data should be protected. But ultimately, the people developing those systems are really the practitioners that are, you know, adding the issues on GitHub, contributing all those PRs.
18:22
So right now, we're at a stage that the leaders of these projects, the actual leads of those projects are more than critical to make sure that they're involved in the development of the guiding rules for our societies. Right. And from that perspective, that even emphasizes more the importance of software
18:42
and software developers and general practitioners in humanity in general, in society in general. So it is the professional responsibility to make sure that we can actually have that interaction. So the principles and guidelines and the underlying open source foundations to be fully aligned. And the cool thing again, you know, open source is something that you yourself can advance, can contribute,
19:06
can lead, you can get involved in the design, development and use of the open source project themselves. So, software foundations like the Software Freedom Conservancy, the Linux Foundation, the Apache Foundation. And it's actually quite interesting. One of the executive directors of the Software Freedom Conservancy has a
19:25
very interesting story where she basically emphasizes that one day she asked for the source code for her pacemaker or, I guess, a specific sort of like device that she says makes her a cyborg lawyer.
19:41
But they denied access to that open source code. And from that perspective, that opens questions that, you know, not only the infrastructure that runs society, but also the infrastructure that is going to run humans. That in itself is critical enough that not only needs to be open source, but needs to make sure that it's aligned with the higher level regulations and rules and foundations of society.
20:03
But this is the key thing. You can get involved and I encourage you to get involved. Use this chance, use this conference. You have a lot of open source contributors, of open source authors. Reach out to me if you are interested to get started. You know, I'll point you to a bunch of good first issues on GitHub.
20:21
I'll point you to the documentation of several projects so that you can actually just read it. It can be as small as really just submitting a fix for a typo. Right. That's already contributing. And that's very, very useful. So from that perspective, I encourage you to get started. Right. Just just take the leap because not only it's very empowering, but it's also really fun.
20:42
Right. I mean, that's something that is actually quite cool. The communities are really awesome. And they do amazing things like what the organizers of your Python are doing, you know, bringing together like minded individuals and, you know, just letting them out to share all of their knowledge with each other. And I think that's that's absolutely amazing. So let's take a side note. We've touched upon regulation quite a lot. And I know that, you
21:02
know, that's not something that you're interested about. Or maybe you are. Maybe you are. I mean, I think I think it's an interesting thing. The only side note that I want to add is that a lot of individuals like me pose a point saying, well, you know, regulation may may a lot of times be red tape or regulation may introduce and hinder regulation, may hinder innovation.
21:24
But the reality is that we cannot agree, you know, bad regulation is bad, full stop. But having good regulation can actually be a catalyst for innovation. Why? Because it enforces not only best practices, but it mitigates bad actors. Right. And that is very important because ultimately you can't have a society
21:43
that is that is going to be innovative and efficient if there's no society. Right. You need to make sure that the people are carrying out those practices, that they're really thinking of the of the of the individual that they're building it for. They're thinking of the implications. And that ultimately it's not about just making sure that you introduce all this red
22:01
tape and all of this, you know, abstract thinking for every single movement that you do. But it's about making sure that you can assess the impact that your actual solution has. Right. The amount of process that developing a prototype would have to go is very different to the amount of process that would be required for the deployment of a critical piece of software infrastructure in a signaling system for a railway industry.
22:26
Right. I mean, that that is the key thing, right. It is proportionate to the impact that is going to happen. And that's something that also needs to be important because it's not just about like introducing regulation for the sake of introducing standards for the sake of introducing red tape for the sake of making sure that it is the right fit for the right.
22:44
So that's basically from now, we can all agree, you know, software has a massive traction and potential in a lot of areas. Right. Internet services, machine learning, automation, cognitive infrastructure. But, you know, when you when you're running around with a hammer, everything may look like a nail.
23:01
Right. And it's important to also realize that not all not all problems in the world can be solved with software. Right. So there is a small subset that will be solved with software, but largely a lot of the issues, especially when it comes to organizational issues, they may just be human problems and they can be just addressed with human solutions.
23:21
So it's a matter of also not only knowing when to you know how to develop software, but also to know how to. OK, let me rephrase that. Knowing how to know how to not to use software. Right. So basically, knowing not knowing not to use it when you don't have to. Right. So that's that's the key thing.
23:41
And the key thing there is because, you know, we all know we're in the challenge of our generations by both from the societal impact and economic impact. We've seen a lot of sort of attempts to just solve it with apps and websites. And from that perspective, a lot of them have actually helped. And there has been a pretty substantial impact from the open source traction and some of the of the open research that has been done there.
24:06
But from that perspective, largely a lot of that is not specifically a human problem. A website is not just going to solve the entire thing. It's going to be a combination of all of this. And I think that is one of the important things. And one of the things to emphasize here is that this challenge of our generation, you know, again, may not be the last one.
24:22
There's things to leverage, things to learn, things to make sure that we can make sure we can leverage the power of software for the best. Right. And making sure that we can develop human solutions with with code that can develop even more augmentation of what can be done just with the software itself.
24:42
Right. So so again, you know, just to kind of like, you know, rub that point is ensuring the right solution before tackling a problem. We should make sure that we identify how much of that is actually a software problem or how even how much of that is actually a problem before developing the solution. Right. Because often, you know, there is the approach of like, you know, first I have a solution and I run around and I try to find how can I fit it into a problem.
25:04
Right. And we want to avoid that stuff. So we've covered a very sort of like high level overview of some of the key components. What I now want to give you an insight of is how we approached it at the institute. Right. I want to show you how we approached it specifically to a subset of software, which in this case is the machine learning.
25:24
Right. And more specifically, the development of large scale machine learning systems, which in turn, they're still microservices that leverage infrastructure. That churn out logs, that churn out metrics that are collected with Elasticsearch Prometheus. So ultimately, it's it's all software specific to a domain. And I want to just show you how we addressed it.
25:44
And I want to give you a case study of what does it look like when it's not when the best practices are not used in the best way possible. So the way that we did it is by first looking at what are the biggest challenges in machine learning, production machine learning.
26:01
And, you know, production machine learning is hard because you have a lot of specialized hardware. We have a lot of complex dependency graphs. You have a lot of compliance. And then you need to make sure that the reproducibility of components deploy something that has a potentially undeterministic state. You want to make sure that you can re spin that up.
26:20
And we run with similar results to be able to track either experiments or for the ability best practices. Right. So, again, very similar to the general challenges that you would find in software. But specifically here, you know, those are sort of like one of some of the biggest sort of like challenges. And from that perspective, what we abstracted is some of the key principles that as a as a practitioner,
26:43
as a delivery individual, someone that is writing either software or that is doing product management or somebody involved in the delivery of a machine learning system can follow. Right. So this is, you know, human augmentation, making sure that it doesn't is just about replacing the entire human from the from the from the equation bias evaluation,
27:01
making sure that there's no algorithmic algorithmic bias, but that there's a way to mitigate a large amount of biases or similar to cybersecurity. We're going to touch upon more that you cannot avoid 100 percent impact, but you can mitigate the risk similar with biases. You can't remove biases where you can mitigate for undesired biases.
27:21
Right. Because the whole purpose of machine learning is to discriminate towards the right answer. And what does that even mean? That is, you know, ultimately based on the representatives, the distribution of data. Right. So ultimately, it's about, you know, removing undesired biases, explainability by justification, being able to have the right level of explainability in your system in itself,
27:43
depending on the, again, proportionate impact, reproducibility of infrastructure, that displacement strategy to make sure that you can retrain, reinvest practical statistical metrics, trust like privacy and security. Right. So very high level. I think you can you can all see how those could fit in just general solutions, not only machine learning, but this is specifically for machine learning systems.
28:04
Now, what we've developed is basically took those principles and abstracted them into a practical set of standards in a way. Right. What this is basically what we refer to as a procurement framework. This is basically what an organization would use when evaluating a supplier, basically asking the
28:22
questions that would allow them to understand that a supplier is not just selling snake oil. Right. And from that perspective, it's important because we realized that getting people in a field to all agree what is best practice is not as easy as one would imagine. So what we did was the converse, which was much easier to be able to align on what everybody agrees is bad practice.
28:47
What is a red flag? What can you make sure that if somebody is either not complying with or that currently has a red flag, you can be sure that that is bad practice. Right. And from that perspective, that allow us to approach it from a bottom up perspective, a much more pragmatic approach.
29:03
So you can see that from those principles we've been able to nail it down to things like, you know, practical benchmarks, explainability by justification, the same sort of principles, but breaking them down into a checklist. What does that checklist look like? Similar to a security, a cybersecurity questionnaire that you can actually say, okay, well, the supplier doesn't have infrastructure to version different machine learning models.
29:26
Right. Version control, similar to software, git, commit, git push. Not exactly, but just to get the ideas. A protocol to evaluate whether a machine learning model requires domain expertise. Right. Does that require some
29:41
experts to be able to come and assess, to have the capabilities to perform development across different environments. Right. So this is all kind of like quite standard. We all talk about it and it's obviously intuitive, but from that perspective is just actually setting it so that people can follow it. And this is quite important because especially in emerging technology, you know, machine learning is one of many emerging technology waves that we have seen. Right.
30:04
And then it's just yet another. There's going to be another one and another one from every single time a new technology comes in. There is the ability to be able to know what are the known best practices and how to fit that into those best practices as well as developing the relevant new best practices for technology.
30:21
So that's how we approached it to this sort of like next level. Right. So this is the standards. Now we're going to go one level deeper. Right. And this is in the code. So what we did is we created a large production machine learning list. This has hundreds of different tools that are relevant for best practice of machine learning.
30:40
Again, there's tons of these lists which are known as awesome lists. That's how they refer to them. But I would recommend you to, you know, if you're looking to a field, this is way like a lot of ways to find the right sort of like fit for solutions. But again, it's the multiple levels that need to engage all together the right principles, the right practices or standards, and then the right tools and processes.
31:03
Right. So from that perspective, this is the same thing that we just talked. And then there's another list that we all put together about high level guidelines. If you're curious and you want to check it out. But you know, that's something that you can dive into if you have more interest. So OK. So we now looked at how we tackled it from a high level perspective. Now let's look
31:22
at a practical example of what does what does it look like if we were to not follow best practices. Right. And we're going to we're going to actually look at taking a specific in this case, like algorithm using it, deploying it and then seeing what happens when you're not following best practices. And then how that looks like when you actually. Right. And this is, again, a very, very specific case. But I wanted
31:44
to you know, I didn't want to leave you in this presentation with just very high level concepts and very abstract thoughts. Right. I wanted to give you like a very specific. And we're going to dive into this principles bias evaluation, explainability, security, human the loop and practical metrics. But specifically, we're going to dive more than anything on the bias evaluation, explainability. So let's have a look at what this is.
32:07
And I'm going to start from scratch. Right. So let's say you are a software team that has been assigned with a task to build a system that automates the process that currently looks as follows.
32:23
There is a process with a domain expert goes through applications for loans and either approves those loans or rejects those things. That's basically what what they are looking to automate. And let's say business heard that about this machine learning thing and they want to use all of the shiny tools. Right. So this
32:43
team, let's assume, gets tasked with having to automate this end to end process where loans are being sent or loans are being used. And loans are being submitted and then where a sort of response is given, whether they're approved or rejected. That's, that's it. Very simple. Let's dive into how it was done.
33:03
So the traditional data science process looks as follows. We get some training data and offline. We basically convert this data into a way where we can actually teach and train a machine learning model. Let's take that as a black box. This machine learning model would take an input of data and would try to predict an answer. That's basically what it would do.
33:24
In this case, it would take the loans and it would try to predict an answer until it's able to learn as close as possible to the distribution of that data to the patterns of that data. So that's what we're what we would be doing. We would rinse and repeat until we're happy with the accuracy of the model. We are happy of how accurate the model is.
33:41
Once we have done that, we can persist that model and then put it in production. What that means is deployed as a microservice that is going to be then listening for new incoming data in live production. Similar to any API that would be receiving in this case a new loan and it would provide whether it thinks it should be approved or rejected based on how we trained.
34:02
This is in a very high level, the general flow. I know I'm oversimplifying it, but this is for the sake of intuition. This is what we're going to be looking at. We're going to first move into the first part, the training of the model. How did the team approach it? They went to the business and they asked for some example loans and then the example answers.
34:25
Whether that loan was historically approved or rejected. They basically took those. You can see that each of the fields of that form include age, working class, education, education number of years, etc. And then at the right, whether the loan was approved or rejected. So they took basically let's say 8000 rows and then they took
34:45
this and then just fed it into a stack overflow model that they copy pasted because they saw that that's actually what's going to work. And they realized that on their first run they achieved 99% accuracy. From that perspective, on their
35:01
first run, on let's say a Friday afternoon, they trained it once and they got 99% accuracy. From that perspective, there's a question. Is it time for production? I would probably ask the audience to raise your hand if you think that it should go to production, but I'll just assume that everybody is raising their hand.
35:24
So, you know, they push it to production and lo and behold, it's a disaster, right? So what they see with production data, I'm going to explain what this basically means, is that the model was basically rejecting everything. A loan that was expected to be approved was rejected and everything was rejected. So this basically just trained the model that was rejecting it.
35:45
So from that perspective, when they looked at the data that they used in production to train versus the data in production, they see that there is a massive difference, right? This is the number of rejected loans that they used and the number of approved loans that they used to train them. And from that perspective, they have in the other side, the number of approved loans and rejected loans, you know, supposed to be much more in line.
36:13
So from that all in all, I think it's quite important to make sure that they realized, well, in production, there
36:23
were some loans that were expected to be approved, but then in reality, they were like mostly loans that were actually rejected. So the model was just trained with an imbalanced data set. So I think that's one of the sort of like key important things, right? So if they actually work to analyze further, one of the things to make sure that
36:42
they understand is that it's not all about just, you know, creating a complete balance, right? Making sure that everything is extremely balanced, right? Because that's not what the objective would be. Because if you actually look deeper into the predicted sort of breakdown, you will be able to see that the number of predictions are, you know, not really equivalent, right?
37:04
You have a much higher number of like, you know, predictions that were approved for male versus for female that then you would actually also have an imbalance. But the key thing here is that it would be about following not just a push to make sure that
37:20
everything is balanced, but making sure that you're following the best practices that are fit for that use case, right? Is the right distribution for our challenge, right? So that's the key thing. And from that perspective, it's basically following and extending that initial process that we proposed at the beginning. Instead of just getting data, training the model and persisting it and deploying it, what we would basically do is
37:45
making sure that we introduce a step to analyze what data was used, to clean what that data came in. Then we need to make sure that there is an assessment of the metrics that are being used to evaluate the model, to see if they actually align with that perspective, to make sure that it's aligned with that.
38:06
And then from that perspective, once it's deployed, to make sure that the model in itself is monitored once it's deployed, right? Because in turn, once you deploy a model, that's when the life cycle of that model begins, right? And if there is a concept drift, you want to make sure that that can be identified.
38:25
So from that perspective, again, you know, it's about making sure that you bring the right domain expertise, the right high level individuals. And then from that perspective, making sure that you bring the right tools. And in this case, there are tools to aid your development, you know, up sampling, down sampling.
38:42
You can make sure that you take into account correlation, because it's not all about just removing specific features, making sure that you follow better scoring metrics and then also use specific algorithms to be able to understand the importance of each of your features, right? Like there's a lot of techniques. And from that perspective, you know, I don't want
39:03
to delve into a little bit into too much detail from the perspective of this domain. But I wanted to just give you a high level overview of an example of that. So with that, I think, you know, we've covered a lot of simple examples. We have been able to dive into the impact of software development.
39:23
We've covered some of the responsibilities that we have as individuals and organizations. We talked about several things like, you know, ethics and principles, industry and code standards, finding the right solution for the right problem and also getting started into open source. And then we had a practical deep dive of how we ourselves were able to implement this specifically in context of machine learning systems.
39:48
So with that, I want to close off. I want to also give a shout out, of course, to the great XKCD for, you know, always generating amazing artwork and, you know, keeping people that may have been a bit bored with all of those different pieces.
40:04
So I'll pause there and I'll pass it over back to you, Martin. Thank you so much. And thank you again for joining the Europe. Thank you very much for the panel.
40:22
We do have some questions. Don't forget, you can use the Q&A button. It's on Zoom on the bottom right. And you can ask questions there. So we have a first question of Matthew. Could you expand a bit more on the source of policies and guidelines?
40:44
Are the open source communities collectively determining these are in a democratic manner or should that democratic process not also involve a broader society? So not just developers. Well, that is a fantastic question. Excellent question.
41:01
So the answer to that, and you already alluded to part of the answer, it should most definitely involve the relevant stakeholders at each level. I mean, most of the times I 100 percent agree, I mean, my broader society will be ultimately the people that would get impacted. And this must be developed in a way that is, of course, not only
41:22
democratic, but making sure that the right domain experts are involved throughout the process. So I definitely agree that that is definitely that should be a prime concern when either leading or contributing to some of those working groups. Now, in regards to the point about open source of policies and guidelines, I think what I was
41:44
referring more was not open sourcing the policies because in turn the policies themselves, I mean, they're public. But this is more in regards to open source contributors and open source leaders being involved in the conversations that are existing already about developing regulation.
42:06
Regulation is being developed on a day to day. Things like GDPR involve multiple experts to come together and create those guidelines of regulations. My emphasis and my call to action is that open source developers are more than critical to be involved
42:24
in those conversations because they are key in the systems that really are ultimately going to be enforcing those regulations. So that's basically a really good question. So we have a second question. It's from an anonymous person.
42:45
So how do you find cool open source projects to contribute to as a newbie? He only contributed so far to documentations and he wants to be able to do more. That's great. That's awesome question. So what I would say is basically the best
43:02
way to start contributing is, as you already alluded to, of course, with documentation. But just by being a user, you are already bought into the product in itself. And if you find something that is broken or something that you find could be a potential improvement, that in itself could be something that you can propose.
43:24
And if in itself is a bug, you can actually even try to jump into addressing it because actually either writing tests or fixing bugs tends to be a good way into the code base in itself.
43:40
And ultimately, even proposing to write unit tests and improve the testing is often one of the best ways because you can be a little bit at least more comfortable with the code base not breaking, except, you know, potentially that test failing and you will be getting into the depths of the code base.
44:04
I would definitely recommend to that and reach out to me on Discord. I'll point you to a set of good first issues. We do have more questions here in the Q&A. Actually, two more questions. But unfortunately, we run out of time. But it's no problem. You can take these questions to the chat. It's in the channel talk. First, deployment.
44:29
So to get there, just click in Discord, command K and enter deployment and you will find the first search result will be this channel. So you can go there and you can ask the questions there.
44:44
And I'm sure Alejandro will answer them in the next couple of minutes. So thank you very much again, Alejandro, and hopefully see you again next year in Dublin. If everything works well again or maybe online, we will see.
45:02
Thank you so much, Martin. It's a pleasure. So we're closing the keynote session and in five minutes or four minutes, we will start with the next sessions. And don't forget to go to the Discord channels to ask your wonderful questions.