We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Synergize AI and Domain Expertise - Explainability Check with Python

00:00

Formal Metadata

Title
Synergize AI and Domain Expertise - Explainability Check with Python
Subtitle
Build trust, transparency and confidence between models and decision makers
Title of Series
Number of Parts
112
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We will go through the Why? How? and What? of Model Explainability to build consistent, robust and trustworthy models. We explore the inability of complex models to deliver meaningful insights, cause-effect relationships and inter-connected effects within data and how explainers can empower decision makers with more than just predictions. We evaluate an intuitive game-theory based algorithm, SHAP, with a working implementation in Python. We will also pin-point intersections necessary with domain experts with 2 practical industry applications to facilitate further exploration.
Domain nameGoogolMathematical modelCore dumpPlanningMathematical optimizationReinforcement learningStatement (computer science)Integrated development environmentSample (statistics)FeedbackCausalityComputer networkLinear regressionBuildingParameter (computer programming)Pattern languageFunction (mathematics)AlgorithmPredictionMathematicsLinear regressionProcess (computing)Artificial neural networkDomain nameMathematical modelMathematical modelCategory of beingDecision theoryTunisMaterialization (paranormal)Expert systemProduct (business)Computing platformIntegrated development environmentVariable (mathematics)AreaFunction (mathematics)FrequencyAttribute grammaroutputMathematical optimizationMedical imagingTable (information)Virtual machinePredictabilityMultiplicationResultantForestGame theoryParameter (computer programming)Pattern languageClient (computing)Dependent and independent variablesNetwork topologySlide ruleEntire functionLevel (video gaming)Combinational logicRandomizationMultiplication signInterpreter (computing)Point (geometry)Different (Kate Ryan album)Goodness of fitEvent horizonStatement (computer science)Machine visionMachine learningConfidence intervalBuildingCartesian coordinate systemElectric generatorSimilarity (geometry)Bookmark (World Wide Web)RankingCurveSoftwareBounded variationPlanningMoment (mathematics)Mathematical analysisNegative numberComputer programming1 (number)Musical ensembleCoefficient of determinationWeightVideo gameCore dumpFitness functionRight angleHeegaard splittingPopulation densityField (computer science)LinearizationLine (geometry)Arithmetic progressionStrategy gameState of matterWave packetTelecommunicationPower (physics)DatabaseComputer animation
Variable (mathematics)MathematicsPattern languageStatisticsConnected spaceFrequencyMathematical modelTuring testHypermediaoutputRadio-frequency identificationBuildingSample (statistics)Domain nameAlgorithmPredictionFunction (mathematics)Artificial neural networkGradientLatent heatAverageFunction (mathematics)CurveoutputVariable (mathematics)PredictabilityScatteringPoint (geometry)Sound effectNegative numberMultiplicationMaterialization (paranormal)Real numberExpert systemIntegrated development environmentSet (mathematics)PlotterProjective planeAlgorithmDomain nameVideo gameFunctional (mathematics)Category of beingDependent and independent variablesStatisticsMedical imagingMobile appMessage passingSimilarity (geometry)AdditionParabolaHyperbolic functionDifferent (Kate Ryan album)Correlation and dependenceProcess (computing)Black boxResultantNumberSpacetimeData conversionRow (database)Control flowObservational studyValidity (statistics)Chemical equationTelecommunicationCross-correlationSheaf (mathematics)CausalityMathematical modelAreaLatent heatGradientMereologyLocal ringMathematical modelExecution unitTwitterWebsiteUniformer RaumMultiplication signCAN busCartesian coordinate systemProduct (business)Scaling (geometry)Goodness of fitAverageOrder (biology)Right angleCASE <Informatik>RadiusComputer programmingExistenceWhiteboardReading (process)Potenz <Mathematik>WordFilm editingFacebookComputer animation
AveragePredictionStatisticsMathematical modelFunction (mathematics)Variable (mathematics)Independence (probability theory)outputPlot (narrative)CoalitionMarginal distributionShapley-LösungPermutationForestLinear regressionModul <Datentyp>Computer fileLetterpress printingCurve fittingInstallation artCodeSubject indexingComputer multitaskingFrame problemRAIDElectronic meeting systemModule (mathematics)Mathematical modelChemical equationMathematical modelInteractive televisionAlgorithmDomain namePoint (geometry)CodeFunction (mathematics)outputPredictabilityForestView (database)Combinational logicPermutationForm (programming)Parameter (computer programming)NumberPlotterLink (knot theory)Wave packetVirtual machineLibrary (computing)Greatest elementSoftware testingSampling (statistics)AverageDependent and independent variablesRandomizationLine (geometry)Game theory1 (number)AdditionOpen sourceSlide ruleVariable (mathematics)Divisor2 (number)Order (biology)Characteristic polynomialNegative numberIterationInstallation artSet (mathematics)Heegaard splittingBounded variationPosition operatorDirection (geometry)WordAbsolute valueIndependence (probability theory)Scripting languageProduct (business)Multiplication signCAN busRight angleStandard deviationExpert systemGradientStatisticsCoalitionCondition numberSpacetimeCore dumpEntire functionComputer animation
AveragePairwise comparisonCASE <Informatik>ForestLecture/ConferenceMeeting/Interview
outputMathematicsCoefficient of determinationMereology2 (number)Order (biology)Set (mathematics)VarianceTerm (mathematics)RankingForestRandomizationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
So guys, I'm sorry, but not everyone understands mathematics, data science, AI, and we shouldn't expect them to. And that's the fundamental idea behind my talk today. So, good afternoon. I want to welcome you all to my talk.
And we're gonna be talking about synergizing AI and domain expertise. I'd love to present this talk as a discussion of thoughts and ideas. So feel free to interrupt me at any point of time during my talk if you have any questions, right? I'd also sincerely appreciate the EuroPython organizing community for bringing up such an amazing event
because this is my first time at an event like this and I'm really appreciating the bonds and networks I'm making this day, right? So, okay, I would start with the inspiration and the vision that I've been having for the last two years, and that's what's written on the slide there, to build trust, transparency, and confidence
between models and decision makers, right? It's a fundamental gap that is existing today between human intelligence, people who are experienced in their domain who've been doing this for the last 20, 30, 40 years. They know the way around things and they're not comfortable when a machine makes a decision
which is not very clear with the problem statement or the environment in which the problem has been sustaining over the period of time. And that's what I like to do. I like to bridge those gaps between human and machine intelligence. I work as an AI scientist for Polymerize. We are a product-based startup based out of Singapore. We've built the next generation AI-powered platform
to insanely accelerate material science research and R&D across the globe. I'm a Pythonista, which we all are, data science and machine learning to solve data-based problems and enabling data-driven decisions. Previously, I've built a fully powered AI platform right now in production to solve for marketing attribution
for all of you who do not know what attribution is. If I spend some amount in a particular marketing channel, I wanna know how much return I got from that channel. That specifically is called marketing attribution. And once I know what my attribution is, I'd really like to plan for the next quarter and optimize my spends to maximize my ROI in the next quarter.
So that's what we've achieved. We've got 1.5x ROIs, which is amazing. Over the period of these two years while experiencing for marketing as well as for material science, I've grown my interest over these following core areas. The first one of that is explainable AI because that has helped me a lot
in communicating between business decision-makers and my team of AI scientists, machine learning engineers and data scientists. I've also worked with a lot of tabular neural networks. You can consider material science and marketing. They don't have a lot of images to work with, so not ConvNets or RNNs, but mostly focused on tabular neural networks. And I've developed design thinking strategies,
the ability to checkpoint models, and also try to find loopholes in all of your neural networks. I've also done some reinforcement learning, optimization to exploit the learned patterns by the models, and finally to deploy models in production, scalable distributed training and tuning.
Cool. Moving on to the next slide, this is my experience and my experiments with truth in the domain. So there are a lot of situations which are high-stake decisions, right? You want to do your marketing planning for the next six months. Once you initialize that plan, you're going all in.
It's like a poker game. You're going all in, and you cannot back off in the middle. And the other problem is, unlike poker, the results are not immediately received. They're gonna be received in days, weeks, and sometimes even months, timelines where if you are late, you might miss the entire opportunity, or you might be in a situation where you can't back off.
So the business users are risk averse because they cannot give a machine the responsibility of their entire business, which is scary, but also interesting for us designers of these algorithms. So let's take an example in material science. Recently, we were developing a material for one of our clients based out of Japan.
And they have a five-stage process of developing a material. And in those five stages, the properties of the ultimate materials are evaluated almost seven times. But the decision they take in the first stage over the course of six months is going to actually impact the material that they develop.
And every individual experiment they do is worth about 50,000 US dollars. So they cannot go wrong with the understanding and analysis of the entire material they're developing. The same example that I gave for marketing works well here as well. So what do we do? Let's bring an expert. He's been trained at this for 30 years. He's done PhDs, done masters,
he's read the field, he has experience. He meets 500 other people with similar problem-solving capabilities, and he gives you a suggestion. Is that always efficient? Not so much. Is that consistent? Yes. But is that revolutionary for the business in an environment that's ever-changing every day?
Supposedly no. So what do we do? We bring in this expert and we just let the things work the way they do. But now we wanna make a change. We wanna make sure that machines try to combine and acknowledge the learning from the domain experts and then develop things together in synergy
so the decisions are not made just by individuals and not just by machines, but a combination of the two. So let's find a way which is explainable AI. So starting with Simon Sinek's favorite line, let's start with Y, okay? So if you look at the curve here, you can see there's a performance ranking on the Y axis
and an explainability ranking on the X axis. If you look at a model, let's say a linear regression model, it's a super, super explainable model. You know how much each input variable impacts the output. Does it have a positive or a negative impact? So quantitative and qualitative impact. And that's exactly what we need to know from the model.
It makes a prediction, but it also tells us how it makes a prediction. But on the other end, there's a neural network, right? And the moment we try to understand a neural network, we're all puzzled because there are so many variables. There are bias, there is weights, there's all kinds of different layers, and we have no clue at each layer
how things are being impacted. That's when we need explainable AI. There's also other models in the middle, but trust me, most of these models are pretty difficult to explain. And even if you get close, there might be some variation that you would have never known of, and the entire explanation might just fail. I also use a lot of explainable AI in the last two or three years
just to make sure that when I'm tuning models and I got five of my top best models, usually what we do is we create an ensemble. We take the average prediction from all the five models, and that is a robust prediction of my output. But when you look at those situations where these five models
have a different explanation of them, then you can be pretty sure that a domain expert can filter out and map the really good models from the ones that are just lucky or have been overfit or there's some serious programming flaw in them, right? Moving to the next slide. Interpretability versus explainability. So on the left,
you're gonna see a linear regression model and a simple decision tree model. When you look at these models, the parameters and the model summary will really tell you what the model's doing inside. You can look under the hood of these models, and you'll be fairly sure about how they're making a prediction. But when you look at the models on the right, the first one is a random forest classifier.
It's a combination of multiple decision trees. They're split at different levels. They have different densities, different depths, how they reduce from the input to the output. And on the right, you obviously have a neural network which is way more complicated than we can actually interpret. And you can see the difference that interpretation is where the models can actually explain themselves.
The models talk about how they're doing it. Explainable models are the models that need an algorithm, a statistical backing to tell us how they are making a prediction so that communication is what explainable AI is getting us into. Talking about some few fundamental principles before moving forward to the advanced section
is that there are two interesting ideas, correlation and causation. A lot of people, and I've seen a lot of high-end engineers often getting confused between the two. Causation is what the businesses are looking for. They want to know the levers that are impacting their business output, their KPIs.
But what we usually see with a lot of models is not causation, something called correlation. You guys would have heard about the statistical method. It's just a way to represent the statistical relationship between two independent variables. And then also side issue is correlation is only valid for linear relationships.
So if you have a nonlinear relationship, like a hyperbolic curve or a parabolic curve or an exponential curve, correlation is not able to detect the statistical similarity between those two variables. So how do we move from correlation, which is a statistical property of the existing data, to causation, which is the real learning
behind how inputs of the business are impacting the output of the business? That's where Explainable AI comes in handy. I'm sorry, I forgot the example. So a lack of RAM would cause the phone to freeze, but when the phone freezes and text messages don't work, those are correlated properties.
Text messages might not be working because of an app failure or because of any of the other 10 reasons that all of us engineers know about. But whenever a lack of RAM is achieved, a lot of things stop because the lack of RAM has caused them to stop. Moving forward. I'm happy to take any questions in the middle
because I'm discussing and jumping over multiple issues. So if there are any, just let me know. Okay, coming to the second important fundamental principle and a problem that Explainable AI is going to solve is the interconnected effects. Let me give you an example. If you look at the image on the top, there is a chemical reaction that I am trying to get into.
If I do it with a catalyst, two different ingredients come. It's a multistage process. And if I do it without a catalyst, a different set of chemicals are created. So the addition of that catalyst significantly changes the output of the chemical reaction, right?
Now, the addition of the catalyst also has to be in a certain quantity that satisfies the requirement of catalyst. So it's an interconnected effect between the input variables. It's not just one input. It's multiple inputs acting together and in a significant quantity for the ultimate result to take place. That is what we claim as an interconnected effect.
When you go to a marketing situation, let's say I advertise on Instagram, on Facebook, on Snapchat, LinkedIn, and multiple places, but there's always a sweet spot of marketing where if a client or if your customer observes you in multiple places, they might have a higher chance of conversion if they see you a lot in a different place
or if they see you a lot in a single space. When I market to my consumers on Instagram a lot many number of times, they might just get irritated. They might not like the brand. But if they see me everywhere in the environment they are present in, they might be more willing to make a purchase, right?
So the interconnected effects are really, really derived from situations where explainable AI really works, right? I've given the examples. The validation of these experiments with experiential learning, which is learning from the domain experts, and then checkpointing the models of matching those understandings is what really drives value for the businesses.
Okay, let me give you a very interesting case study. So anyone who's ever gone through some amount of material science or chemical engineering, they know that tensile strength and elongation at break are two different properties of a material. The way they work with temperature is the relationship that you see here.
So if you increase temperature, your tensile strength goes down. And if you have a lesser temperature while making the material, your elongation at break is low. So what we really want to achieve for great quality materials that we use in cars, in manufacturing units, is a balance of tensile strength and elongation at break.
Now, when you look at this chart, it actually is exactly represented by the explainable AI chart that you see below. If you look at this SHAP, an algorithm that we're gonna be talking about soon, is an algorithm that gives you just the idea you need from the top curve. And this is what one of our engineers,
one of our scientists actually validated where we started pushing explainable AI. Let's look at the chart below, right? So temperature for elongation has a curve which starts with a lot of blue scatter point plots and a lot of red and pink scatter point points towards the end. If you look at the x-axis,
the left-hand side is a negative impact and the right-hand side is a positive impact. And if you also look at the y-scale, you will see blue represents low and red represents high. So blue represents low values of temperature, red represents high values.
And the impact on the final property, let's say tensile strength or elongation, is on the curve. So if you observe carefully temperature for elongation, when it has a low value on the blue scale, it has a negative impact on elongation, which is exactly represented on the chart above.
And if you look at temperature for tensile strength, the red color, the high value of temperature gives a negative impact on tensile strength. This is what we are going to talk about. This is what's interesting. This is where a synergy can be created. And a domain expert can tell you that yes, the learning of the model is real. It's not a fluke.
And I think this is going to work in a real life situation, right? So scientific literature, years of experience, domain knowledge is going to work in synergy. And examples across domains have been already observed. I've tried it in four different projects. I've consulted a lot of companies for explainable AI because of my research and talks multiple places. And they've really liked an addition,
not as a feature, but as an idea, right? Okay, so what are model explainers? Model explainer is an algorithm that works with the model and the data set to give you an explanation and deliver the learning and prediction explanations, right? So SHAP is one of those algorithms that can work with a black box.
A black box is basically any function. You put an input, you get an output. So the algorithm SHAP works as an explainer to derive learnings from these black box models. On the below one, you will also see something called as a model specific and model agnostic explainers. These are categories when if you want to work
with a specific model, you need a specific explainer. That's called model specific. And model agnostic is where it treats the model as a function and it's going to work as long as you have a function that can do a dot predict, right? If you look at the chart below, the original image is that of a watch being held by a human hand. And if you look at the right,
gradients can only make you read so much. So you see that white area that's just showing that this is the part of the image. But if you look at a specific algorithm called integrated gradients, which is curated for deep learning models, you will see that it could actually detect the entire time, the clock in a much better way. So you know which part of the image
is impacting your prediction that the image is of a watch, right? Okay, we can start with local explanations. So these explanation algorithms can be of two kinds. They can give you a local explanation, they can give you a global explanation. Local is for each record in the data set, you know every input and how it impacted.
So you can see that for the Boston housing data set, Elstat got us 5.79, RM got us negative minus 2.17, Knox negative minus 0.73, so on and so forth for different ingredients. And you can also see something that's which is the FX, which is the final prediction of 24.019.
So we start with an average, which is the models learning that an average housing price starts at let's say $24,000. And then every single input in the houses characteristic add some amount of value. So you know what individual variables are adding negatively and which ones of those are adding positively, right?
Looking at the global explanation, this is what quantifies the impact of inputs, right? So you see again, Elstat is the highest most impacted variable, the input, and it is also sorted in order. But there's something interesting that I want all of you to observe and answer me quickly here. Do you notice something interesting in the first chart?
Anything that strikes? Look closely at all the input variables. Do you find something interesting? Yeah, so the last one says some of four other features.
So what did the algorithm just do? This is going to be a hint towards feature engineering as well. So if you look at the explainable algorithms and you look at the input variables and how they're causing impact, you'll be able to notice that there are certain features that are adding absolutely no value. So it's an amazing tool for feature engineering as well.
On the bottom, you see the fully explained chart that I said from low to high, low to high values of the inputs and the impact it's causing towards the negative or the positive side, sorted by the strength of the impact. So these are charts that explainable AI can give you. And once you learn how to interpret them
and communicate them with businesses, they get amazing insights, remarkable discoveries that they can eventually plan into the business and then add more value. There's another interesting chart that I want to take you guys. So there's an interaction and dependence plot. So this is not just the output prediction
and its explanation. It's how the two variables, performance and sales, interact with each other. So a smaller value of performance, higher value of sales, or a higher value of performance and smaller value of sales might not achieve the full agreement of the business. You might want to have a balance of performance
as well as sales, and that might be the sweet spot. So when you're looking for new candidates, new customers, you want to evaluate their performance and sales from this dependence point of view so it can directly add a lot more value to your business. Okay, coming to the algorithm. A very simplistic standard one.
I don't want to go to the most complicated algorithms because that's not the intention of this talk. It's just to get you excited about the future of AI explainability, right? So Shapley, Shapley Additive Explanations, which is the long form of SHAP. It's a game theory based intuitive approach towards explanation.
So it is the average marginal contribution of an input feature among all coalitions. Coalitions, I would say, is just permutations of all possibilities. So let's say I have three ingredients to my chemical formulation. There's a polymer, there's a capping agent, and there's a diisocyanide. I make all possible permutations
where I switch off some ingredients. I take combinations with respect to, out of three, I can take just one, two of them, all three of them in different combinations. And you can imagine it will be a very high number of permutations that I'll have to do. And once you include the argument of quantity of each,
quantity of polymer, quantity of capping agent, and quantity of diisocyanide, it's going to blow up the entire space of permutations. But once we have all of these permutations and we pass them through our model so that the model gives predictions, I can find out the average marginal contribution of each input feature to my final output.
And because of the average, I will know the impact. And because of the individual permutations, I will know local explanations in any one experiment. When I combine all of this, I will be able to look at explanation curves that I have just shown five minutes ago. At Polymerize, we've written a very interesting white paper on this. You can find the link on the bottom.
It's polymerize.io slash white paper. We've written an interesting white paper for all the material scientists, but I think it will also add a lot of value to Python and Stas here. Okay, so I'll just quickly show you guys some code so you know how easy it is and like we all love Python and open source.
So I import the necessary libraries. These are all standard machine learning libraries, scikit-learn, data sets, model selection, ensemble, everything else. I get the data set. I get my X and Y split, the input and the output. And then I do my training and testing split, which is just to look at the real performance
of my model, which is out of sample testing. I also build a random forest model. Then I build it. I fit my model. I make a prediction and I look at my R2 and Mapei scores. Mapei is mean absolute percentage error. It's a very interesting metric that we've been using at Polymerize. And I see a great performance.
Then pip install shap, not like it was very unusual. We all know how this works. So pip install shap, import shap. And these are the lines that you need to code about. So explainer is equal to shap.explainer. This is going to take the model, which is the random forest regressor and the input data, which is capital X here.
Shap values are going to be calculated from the explainer using the input data. I've added a check additivity falls because certain models and the precision of Python sometimes does not match up. And then you can look at the shap waterfall plot. This is a local explanation. So you can see I've just pinpointed the first experiment that I conducted.
So I see all the variations, the impact cost on the local ones. I can also do a B swamp plot, which is for shap values. I can actually filter the data points that I want to do explanations on. So a B swamp plot gives me the global explanations, the ones that we saw on the slide deck. I can also look at the bar chart, which gives me absolute impact.
So it does not have a direction, positive or negative, but it has absolute impact. And also something very interesting is I can convert these and automate them into a domain expertise synergy. So let's say when I talk to my domain experts, they tell me that these things work like these things
with all of that. So I can actually create an automated script, which post explanation can actually give me the relevance of ideas, right? I'd also like to show you something interesting if I can, if I run everything, you'll see the problem with machine learning models
with this example. If I run this the first time, let's look at the bar chart in the bottom. It takes some time.
This is actually showing you the number of iterations, the permutations that had gone through. You look Lstat is at the top, right? For this particular iteration, right? Second.
Now you see Lstat is the highest impacting factor for one of the experiments, one of the iterations. And the other one is RM, right? So these kind of interesting insights and discoveries are easily found out with explainable AI. And I'm running over time, so I'd like to take some Q&A before I have to leave.
Thank you very much. If you have questions, please come to the microphone so that people from home watching from home can also understand the question. Any questions?
Yeah, I was just wondering, random forest will give you feature importance. Did you do a comparison between the feature importance and the Shapley values? And have you ever had a case where the two didn't agree? Yes, so I would love to show you that example. If I just add a random variable in the data
and I train it with the random forest regressor, the random forest will actually give me it as the third or second in rank in terms of importance when you do a feature importance. But when you know it's random, right? So I did that experiment and it was actually a part of my talk where I was gonna show how feature importance can actually fool us
because feature importance is actually driven by the changes, the variance in the input data. It does not actually understand the data set in that particular order. So feature importance with random forest regressor, you can just try a random variable in the data and you will see it will give you second or third rank in terms of importance. It's not gonna work.
Thank you, any other questions? No, then I'd like to thank you. Thank you, thank you so much, everyone.