We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

00:00

Formal Metadata

Title
Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance
Title of Series
Number of Parts
637
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The lifecycle of a machine learning model only begins once it's in production. In this talk we provide a practical deep dive of the best practices, principles, patterns and techniques around production monitoring of machine learning models. We will cover standard microservice monitoring techniques applied into deployed machine learning models, as well as more advanced paradigms to monitor machine learning models through concept drift, outlier detector and explainability. We'll dive into a hands on example, where we will train an image classification machine learning model from scratch, deploy it as a microservice in Kubernetes, and introduce advanced monitoring components as architectural patterns with hands on examples. These monitoring techniques will include AI Explainers, Outlier Detectors, Concept Drift detectors and Adversarial Detectors. We will also be understanding high level architectural patterns that abstract these complex and advanced monitoring techniques into infrastructural components that will enable for scale, introducing the standardised interfaces required for us to enable monitoring across hundreds or thousands of heterogeneous machine learning models.
179
Thumbnail
20:09
245
253
Thumbnail
30:06
294
350
Thumbnail
59:28
370
419
491
588
Thumbnail
30:18
StatisticsMachine learningTwitterVirtual machineProduct (business)Pattern languagePresentation of a groupTheoryBitMilitary baseSet (mathematics)Computer animation
Associative propertyTensorComa BerenicesGoogolStress (mechanics)Vertex (graph theory)Cluster samplingData modelDataflowTerm (mathematics)Source codeIntegrated development environmentArtificial intelligenceDisintegrationVirtual machineBitOpen sourceAreaMachine learningCore dumpSoftware frameworkStandard deviationMereologySoftware developerLimit (category theory)Physical systemComputer animation
Machine learningRange (statistics)Slide ruleWave packetData modelLaptopLink (knot theory)CodeNP-hardSystem programmingComponent-based software engineeringDependent and independent variablesMereologyOperator (mathematics)Software developerStandard deviationVirtual machinePerspective (visual)AreaProduct (business)Slide rulePattern languageContext awarenessPhysical systemComputer hardwareGraph (mathematics)Complex (psychology)Open sourceNP-hardLink (knot theory)Set (mathematics)Presentation of a groupData modelQuicksortCycle (graph theory)Video gameMetropolitan area networkDemosceneMachine learningDataflowLine (geometry)Forcing (mathematics)XMLProgram flowchart
Data modelModal logicPredictionFeedbackStatisticsPhysical systemCASE <Informatik>Virtual machineIntegrated development environmentEnterprise architectureData modelLatent heatPredictabilityMultiplication signFeedbackService (economics)CASE <Informatik>Graph coloringVideo game1 (number)Cycle (graph theory)Machine learningQuicksortSpacetimeComputer animation
outputPredictionFunction (mathematics)Data modelIRIS-ToutputSocial classData modelVirtual machineQuicksortBlack boxImpulse responseFunction (mathematics)Computer animation
IRIS-TData modelWave packetSystem identificationCore dumpBinary fileProcess (computing)Wave packetDistribution (mathematics)Graph (mathematics)Data modelCASE <Informatik>Software testingForestSign (mathematics)Computer animation
CodeKernel (computing)Data modelBuildingCASE <Informatik>PredictabilityLatent heatSoftware testingForestLinear regressionRandomizationData modelLogistic distributionProduct (business)Point (geometry)Source codeJSON
Core dumpPredictionData modelLogicInferenceSocial classDivisorVirtual machineWave packetProduct (business)PredictabilityCASE <Informatik>Service (economics)Observational studyBit rateWrapper (data mining)outputData model
Core dumpPerformance appraisalDisintegrationBuildingAsynchronous Transfer ModeData modelStapeldateiReal numberIntegrated development environmentElement (mathematics)TunisState of matterWave packetData storage deviceData modelInferenceProcess (computing)Product (business)StapeldateioutputSoftware developerObject (grammar)Continuous integrationSingle-precision floating-point formatReal-time operating systemMetric systemIntegrated development environmentElasticity (physics)LoginVirtual machineINTEGRALArithmetic meanNetwork topologyMatrix (mathematics)Program flowchart
Metric systemTracing (software)StatisticsCategory of beingRun time (program lifecycle phase)Service (economics)Read-only memoryBefehlsprozessorContext awarenessVector potentialVirtual machineIdentifiabilityTracing (software)Drop (liquid)Product (business)Programmer (hardware)Data modelMetric systemReal-time operating systemMereologyCore dumpBasis <Mathematik>Run time (program lifecycle phase)BefehlsprozessorMultiplication signResultantSemiconductor memoryCASE <Informatik>Service (economics)FlagObservational studyFunctional (mathematics)Term (mathematics)Network topology2 (number)Analytic setLine (geometry)Computer animation
CodeData modelKernel (computing)Physical systemCellular automatonFeedbackService (economics)2 (number)Self-organizationPredictabilityRight angleProcess (computing)Data modelData conversion
Wide area networkData modelMetric systemCore dumpBefehlsprozessorPhysical systemTangible user interfaceRead-only memoryRoute of administrationQuicksortError messageConnectivity (graph theory)BefehlsprozessorCrash (computing)Utility softwareCore dumpNumberData modelGraph coloringRight angleGraph (mathematics)ArmPredictabilityData storage deviceComputer animation
Physical systemKernel (computing)Cellular automatonCodeWeightData modelTracing (software)BlogMetric systemBenchmarkRevision controlFeedbackStatisticsDivergenceCore dumpVirtual machineMetric systemPattern languageData modelStatisticsExpert systemMachine learningWrapper (data mining)Connectivity (graph theory)Perspective (visual)Category of beingCuboidBitLevel (video gaming)CASE <Informatik>Real numberInterface (computing)Standard deviationSoftware testingProper mapMereologySingle-precision floating-point formatService (economics)Position operatorReal-time operating systemLoginGraph (mathematics)Right angleWave packetTerm (mathematics)Row (database)Social classFeedbackSource codeComputer animation
Elasticity (physics)InferencePattern languageStatisticsCore dumpMetric systemServer (computing)CoroutineReal-time operating systemStatisticsFeedbackData modelInferenceInternet service providerPattern languageMetric systemRight anglePredictabilityCASE <Informatik>Dependent and independent variablesData storage deviceLevel (video gaming)Program flowchart
Cellular automatonKernel (computing)FeedbackData modelCountingIRIS-TCore dumpPredictionComputer clusterPredictabilityPoint (geometry)
Kernel (computing)CodeData modelIntrusion detection systemCASE <Informatik>BitArm
Real numberHill differential equationLatent class modelOnline helpKernel (computing)Multiplication signBitDeterminantDivergenceData modelRight angleInheritance (object-oriented programming)Computer animation
Real numberComputer-integrated manufacturingRight angleOperator (mathematics)Prisoner's dilemmaPartition (number theory)Data modelPosition operatorCASE <Informatik>Metric systemProcess (computing)Military baseTotal S.A.PRINCE2GenderState of matterSocial classFluidReal-time operating systemDistribution (mathematics)Real numberWave packetVirtual machineDifferent (Kate Ryan album)Basis <Mathematik>InferenceComputer animation
CodeKernel (computing)Real numberMetric systemStatisticsData modelTask (computing)Table (information)Computer-generated imageryData modelMereologyTerm (mathematics)Complex (psychology)Black boxVirtual machineThresholding (image processing)Type theoryInterpreter (computing)CuboidCASE <Informatik>Task (computing)Perspective (visual)Forcing (mathematics)Vector potentialMedical imagingBit rateMetric systemRight angleSocial classRadical (chemistry)Server (computing)Profil (magazine)Product (business)Real-time operating systemArtificial neural networkEntire functionPredictabilitySpacetimeoutputDirection (geometry)Pattern languageStatisticsLatent heatKey (cryptography)IdentifiabilityComputer animation
InferencePattern languageServer (computing)Core dumpSpacetimeData managementAbstractionData modelConnectivity (graph theory)Metric systemPoint cloudGraph (mathematics)Computing platformVirtual machineDependent and independent variablesPerspective (visual)ForceServer (computing)Pattern languageService (economics)Right angleCellular automatonProgram flowchart
InferencePattern languageServer (computing)Core dumpInferenceData modelReverse engineeringoutputProgram flowchart
BuildingKernel (computing)Product (business)CodeOnline helpCASE <Informatik>Data modelConnectivity (graph theory)Type theoryDependent and independent variablesMereologyCore dumpSubject indexingService (economics)Right angleProduct (business)Thermal expansionFamilyCircleoutput
Data modelReal numberStatisticsSocial classDecision theoryJava remote method invocationKernel (computing)Data modelPredictabilityBitoutputProduct (business)Projective planePerspective (visual)Power (physics)Machine learningVirtual machinePattern languageSoftware bugConnectivity (graph theory)
InferencePattern languageCore dumpServer (computing)Physical systemMetric systemData modelOutlierFunction (mathematics)outputLinear regressionTask (computing)Table (information)Computer-generated imageryDistribution (mathematics)WindowInferenceType theoryPattern languageDistribution (mathematics)Software bugOutlierFlagDivergenceData modeloutputFunction (mathematics)Texture mappingDifferent (Kate Ryan album)Event horizonProcess (computing)PlastikkarteData storage deviceGroup actionMultiplication signPerspective (visual)Program flowchartComputer animation
InferenceCore dumpOutlierPattern languageSet (mathematics)Distribution (mathematics)Data modelPoint cloudInferenceAlgorithmoutputDependent and independent variablesWave packetOpen sourceProgram flowchart
Core dumpData modelOutlierExecution unitInferenceCone penetration testPattern languageMetric systemoutputPerturbation theorySign (mathematics)Information securityData storage devicePattern languageData managementVideo gameResultantEnterprise architectureSheaf (mathematics)Texture mappingProgrammer (hardware)Revision controlWordSelf-organizationSet (mathematics)Noise (electronics)Vector potentialSign (mathematics)AbstractionImpulse responseRow (database)InferenceConnectivity (graph theory)IdentifiabilityWindowOpen sourceoutputAlgorithmLatent heatSoftware repositoryNumberSource codeProgram flowchart
Metric systemCore dumpInferenceCone penetration testMusical ensembleServer (computing)Elasticity (physics)Right angleConnectivity (graph theory)Musical ensembleAbstractionServer (computing)Data management2 (number)Virtual machinePosition operatorDirection (geometry)CausalityPattern languageProgram flowchart
Slide ruleLink (knot theory)CodeData modelLaptopWave packetCore dumpLink (knot theory)Slide ruleLaptopPattern languageVirtual machineProduct (business)FreewareGraph coloringReal-time operating systemComputer animation
Machine learningTwitterElement (mathematics)Self-organizationEmailComputer animation
Transcript: English(auto-generated)
Today we're going to be covering production machine learning monitoring principles, patterns and techniques. It's going to be both a theory based session, together with a set of hands on examples that we're going to be covering.
There's quite a lot to cover in this presentation so we're going to have to rush through quite a few key pieces. So, a bit about myself. My name is Alejandro Saucedo, I am Engineering Director at Selden, Chief Scientist at the Institute for Ethical AI, and a member at large at the ACM.
To tell you a bit more about Selden, we are an open source machine learning deployment company, we built one of the most popular Kubernetes based machine learning deployment frameworks. And we're going to be using Selden core for the examples today.
And the Institute is a research center that focuses in developing standards and tools for the responsible development and operation of machine learning systems. And we're part of the Linux Foundation, which allows us to contribute from a very practical perspective. But today we're going to be covering some of the motivations of why should we care about ML monitoring, some
of the principles to achieve efficient and reliable monitoring, some key patterns that have been abstracted to the machine learning world, and then a set of hands on examples that we're going to be switching over.
So, the slides can be found in this link, and then throughout the presentation you will see a set of links below the slide where you'll be able to test the open source examples yourselves. So let's set the scene, and we all are aware that even without the
machine learning context production systems and more specifically production machine learning systems are hard. For us, we interact with contexts that require and that involve thousands of machine learning models, which you can imagine the heterogeneity when it comes to specialized hardware to complex dependency graphs for data and data flow, the compliance requirements,
the reproducibility demands, lineage, metadata, etc. And I have actually a talk online that you can check particularly in this in this area of machine learning operations.
And I guess now it's no longer popular demand but we are now aware that the lifecycle of the model doesn't finish once it's deployed, if anything, it only begins once it's fully trained. It's deployed, it's potentially retrained, it's probably superseded by a better performing model, it's promoted to different environments.
So ultimately, there's quite a lot of time that goes once the model has been deployed and quite a lot of best practices that need to be involved throughout its lifecycle. And more specifically, what we're going to be looking at today is we're going to be taking a model, we're going to be
trying to take a very simple machine learning use case, as we're going to be covering very complex terminology around the machine learning space. We're going to be sending some predictions to this model that is going to be deployed as a microservice to see what can be monitored. We're going to be sending feedback to this model to be able to get some more advanced statistical monitoring.
And then we're going to be delving into more specific terminology like explainability, as a architectural pattern, outlier detection, and drift detection. And as I mentioned, we're going to be taking, you know, the hello world of machine learning, the Iris classifier, it's going to be a simple sort of underlying machine learning model.
But to be able to focus on the overarching terminology that we're going to be using. So specifically, we're going to have an input, which is an array of four numbers, floats, and then an output, which is basically a class from three classes.
That's basically what we're going to be interacting with. We're going to be interacting with this as a black box in a way, primarily as we care primarily of the inputs and outputs. And this is why the premise is set this way. In order for us to be able to deploy this model, we are assuming that there has already
been a data science process to identify the best performance, best data distribution to use to train this model. And in this case, we're going to just be taking a simple model, training it, and then taking this artifact and deploying that. Right? What that really looks like is specifically, you know, getting the Iris data set, then getting a train
test split, and then training a model, which in this case, it's just a logistic regression or random forest. It doesn't really matter for our specific use case, which then allow us to actually perform inference. So in an unseen data point, we would get the prediction.
Then we would be able to actually export and persist that model in this case as a pickle. This is the pickle that we're going to be deploying. And ultimately, this is what we're going to be using around in this presentation. So the way that we're going to be containerizing and deploying it is using Selden. We're going to be able to
leverage directly the artifact, or to create a Python wrapper that then will become a fully fledged microservice using these tools. So it converts it from a either artifact or Python class into an actual microservice where the inputs and outputs can be the data that
you send to this model as a prediction. So the input would be an input array, and the prediction would be one of these three classes. But now let's see what is the anatomy of a production deployed machine learning stack, the end to end area. And in this case, you have the training data, the artifact store, basically the persistence of the state of your production environment, and the inference data.
For the first step, you have your experimentation. This is basically the model training, your high parameter tuning. It uses the input training data, and it creates an artifact, right? This is what we just did. We exported an artifact. Then you're able to deploy that programmatically, right? So that means that with some continuous integration process that
would be in charge of, you know, creating that artifact, putting it in an object store and deploying it, so that then once you deploy it, it can go into your environments, your respective development environments, production environments as a real time model, batch model.
So, with Selvam Core, we're going to be using it to enable us to deploy this model into a Kubernetes cluster. And then every single input and output data, as you would with another microservice, would be stored in an Elastic Prometheus data store or a metrics or logs data store, which ultimately allows you not just to have persistence, but also to be able to use for then again training data.
So what are the monitoring metrics that are involved in production ML systems? So you have the usual microservice metrics, performance metrics and tracing. You have more specific machine learning metrics, statistical performance. You have things like outliers
and drip detection. We're going to cover that, what that means in more detail. And you have explainability tools that often are discussed in an offline analytical aspect, but in this case, we're going to be discussing it in a monitoring production, real time or semi real time basis.
But these are the core things that we're going to be delving in this topic. So let's start with the first part. And I'm sure that many of our fellow programmers in this audience are going to be more familiar with this, performance monitoring. And the principles are being able to monitor your microservice on its running underlying performance. So this is to identify potential bottlenecks, runtime red flags, malfunctions or identify something preemptively before it actually goes wrong.
And then, of course, being able to debug and diagnose unexpected performance. And in this case, it's in the context of our machine learning model. Right, our machine learning model that we deployed crashes, behaves incorrectly, it has major
throughputs, spikes, etc, etc. So that's basically the things that we want to look at. And what that looks in more practical terms, this is things like request per second, latency per request, CPU memory, data utilization, distributed tracing. So, to go back to our example, what we can do now is take that model artifact, and first of
all, deploy it into our Kubernetes cluster, basically pointing to that model artifact, using Seldon to convert it into a microservice. Now we actually have a microservice that we can send requests and receive predictions of exactly the same model that we deployed. We can see that our model is now deployed, and we can actually send data for it to basically
process. Right, so we're going to send now some data, it's sending requests, and then it's receiving the predictions. Ultimately, what we're now going to see is we're going to start seeing some, you know, requests per second, we're going to start seeing some insights on the latency.
Ultimately, we will also be able to see the performance in our cluster sort of changing, we can see that the CPU utilization is now being in use. And this is the usual stuff that you would see in a microservice, right? This is not something new. But ultimately, if the model crashes or has a wrong prediction, we will be able to
see this in the number of success requests or the number of 400s or 500 requests, right? This is basically the number of errors that potentially appear in our system, right? So this is basically some of the core components. Now we can see that there's seven requests per second, right? So this is basically the
usual things that you would see in your normal microservice, right? So let's actually pause that. And let's have a look deeper. So the patterns that you would see here is what we just said, take your model artifacts, deploy them as a microservice, and then extract the similar insights that you would for your usual microservices, metrics, logs, tracing, etc.
But now let's go one level deeper, statistical monitoring principles, right? What is the statistical monitoring? This is basically monitoring specific to statistical properties of the machine learning performance. So this is things like calculating using the corrected labels so that we can actually understand
how the model is performing compared to how it was performing when we trained it, right? This means that if it performed really well during training, we would deploy it, see unseen data, but until we provide corrected and correct labels, we can then see the actual performance.
This is things like accuracy, precision recall, and can be used to benchmark either models in real time, one against another one as A-B tests, and it can also be used for evaluating in a long-term perspective. We will see more advanced concepts in a bit. This is key, and this is one of the core insights that we've sort of like extended
within Seldon so that it is used as a first-class citizen so that you can have your models almost out of the box with these components. And what this looks like in practice is things that you would often see as a data scientist, true positives, true negatives, false positives, false negatives, which can be converted into accuracy,
precision, recall, specificity, and then even allowing you to have more specialized metrics like AL divergence, et cetera. But I mean, ultimately what this is, you know, even though we're seeing something very specific to machine learning, this just basically means machine learning specific to the use case that you are interacting with. In this case, it's machine learning, right? But ultimately it's abstracting some of these core patterns that we would see into reusable components.
And this is important for us because we deal with thousands of models, right? We can't have every single thing, every single deployed model wrapped in a flask wrapper with an unstandardized interface with super specialized metrics.
We can't expect all of our DevOps and SREs to be machine learning experts as well, right? And from that same perspective, let's not delve into architectural patterns. And this is what we coined as the statistical monitoring pattern, the statistical feedback pattern. And this is basically, if you remember, we deployed our model. What does our model do right now? It sends inference data and returns a prediction.
But what can we do beyond that? We can actually make it such that it doesn't only send the data, but it stores the inference data with the response, the prediction, right?
And now when we send the correct label, where we tell the model, hey, that ID that you sent previously was wrong, here's the correct label, or it was correct, here's the correct label. Then we can have another microservice, which listens to that feedback and is able to compare that old inference request
with that incoming feedback, and then be able to provide real time performance metrics of how the model is running. In this case, is it actually performing well or is it not? What does that look like in practice? Well, let's have a look. So in this case, instead of just sending a bunch of requests, we're going to be sending a bunch of requests and
storing the request ID, and we're going to be using that request ID to do what? To basically send the correct labels, right? So as we finish sending this request ID, and we are going to be seeing some, you know, actual spike in the predictions, you know, 4.2 now.
We are now going to be able to not send predictions, but send corrections. Send, hey, hey, model, here are the correct labels, and this is the respective IDs to where it should reside, right? So what I'm saying is these are the correct labels, and this is the ID of the relevant request. So we're going to be sending corrections.
And in this case, we don't want to just send corrections because that would be a bit boring because I already have the model with very good performance. We want to send something that shows us to you. And in this case, we're going to send the randomized correct labels, which means that they're not correct.
So the model is going to get a lot of things wrong. And what that's going to allow us to do is to just basically start seeing some divergence in the performance, right? And we want to see a bit faster divergence, so we don't want to wait, you know, half a second every time that it sends a request, we want to make it a little bit faster. And what we can see now is that in the performance of the model, we're going to start seeing some sort of like the determinant on its performance.
And then going back to the use case, if you remember, what do we have deployed? We have deployed a model that predicts one of three classes, right? So here, not only we can see the total accuracy, precision, and recall, but we can see the actual breakdown on a per class basis.
And we can see that class two, class one, and class zero have different accuracy, precision, and recall, right? And why is this important? I mean, for flowers, maybe not as much. But for real humans, being able to identify your accuracy, precision, and recall for protected features like gender or ethnicity allows you
to identify whether there is potential inherent bias in the real time deployed model that is performing inference on real data, right?
So if you have models that are actually having impact on humans' lives, you need to make sure that this is in accordance with the distribution of the data that you saw on your training when the model was being created. And this is actually something really cool, right? Because not only you're getting an overview of metrics and monitoring, but you're
getting something that is not just specific to machine learning, but specific to a use case that makes it particularly useful for not only the machine learning practitioners, but also the specific operational stakeholders that would be managing the process in itself, right?
So here we can see, hey, we're starting to see some bad performance of the model, so somebody should do something about it, right? So that's where you can actually set alerts, and you can notify individuals to say, hey, this model is not performing well, maybe it needs some retraining, right? So that's the key thing.
Now let's pause that because, you know, we don't want to have business finding out that our model is performing bad. And let's move to the next part, explainability, monitoring. So what's explainability? So explainability is human interpretable insights for the model's behavior. So your model predicted class
3, explainability is to be able to understand and explain why did it predict class 3. And this is important in use cases where you are having direct impact into potential users that may be detrimental or of high risk for that individual, right?
Things like credit risk predictions, or as we have seen in some potentially high profile, bad practices like sentencing prediction. And it's important primarily now because we now are starting to see an adoption of black box complex models like deep neural networks.
So introducing interpretability techniques only allows you to leverage more of these advanced techniques. So this is use case specific explainability capabilities, justifiable interpretations of model predictions.
And then, of course, to identify key metrics, such as trust scores or statistical performance thresholds that can be used not just to explain on an analytical perspective, but also used on a monitoring basic as a real time perspective. And then enabling for more complex machine learning techniques, as we mentioned.
So the terms that you tend to see in the machine learning explainability space is whether it is local for a single prediction or global for the entire data set, whether it is black box interacting with just the inputs and outputs, or white box actually opening up the model and seeing what's inside. The type of task, classification, regression, the data type, tabular images, et cetera, et cetera.
And ultimately, for this, we also require an architectural pattern. And why do we introduce patterns? The reason why is back to the same premise. If we have thousands of models with hundreds of explainers, hundreds of metric servers, we don't want to have to deal or our DevOps and SREs and IT managers and platform leads shouldn't have to deal with hyper specialized individual components that require
high amount of machine learning expertise in order to monitor in a baseline perspective. And this introduces the ability to have infrastructural components that can be abstracted and scaled.
Within Selden, we've abstracted this into cloud native patterns, which often is referred in the Kubernetes space as custom resource definitions. So this is basically abstractions for the Kubernetes world where you can deploy an explainer, right? You can deploy a model, you can deploy a metric server, et cetera, et cetera.
So you only deal with these components and you deal with those perspectives. And this is important because, you know, we're talking about the monitoring of our models, but this is a microservice as well. And this has monitoring metrics as well. So from this perspective, just to cover in detail, when you send a request to the model, you get an inference response.
When you send a request to the explainer server, you don't just get an inference response. The explainer takes that input data, it reverse engineers the models by interacting with it, and then returns an explanation, right? And what that looks like in practice is if we actually deploy an explainer, so here
we can train an explainer, an anchor tabular that tells us our more strong predictive features. We can use an actual input, in this case is just basically this first index, and we can explain it, right? So this tells us that actually from this explanation, the core most impactful pieces is the petal width, which I assume is this one, and the sepal width, which I assume is this one.
And if they are over this terms, it would actually converge into that prediction, right? So this is type of explanations that allow you to go back to your use case and explain them. Similar to the model, we can export the explainer, right? And that exported explainer can be
deployed as an actual microservice component as part of that model that we have deployed, right? So we can deploy that specific microservice. We now have a deployed explainer. And similar to how we just did it over there, that we, you know, sent a request to the model and got a response, we can actually send a request to the explainer and then get a response, right?
We just got a response and we can print it and the response is the same, right? This is actually a RESTful request that we just sent to a microservice that actually we can see in here as metrics, right? We can see the explainer, although it will take a little bit to actually register that input prediction.
And ultimately, from that same perspective, you know, we can actually see that we could monitor the explainer. Look, our model is still, you know, performing worse and worse, even when we stop the actual predictions. So that shows you the power of not just the machine learning techniques that we're using, but also the power of the architectural patterns that we're introducing.
And just for the last remaining, you know, three to five minutes, I'm going to cover the last few components, which is the outlier and drift monitoring principles. This is basically being able to detect anomalies or being able to detect drift in data. We'll see what that actually looks like in practice. Outlier detection, basically, if you have data that doesn't fit the distribution of the type
of data that you're seeing or drift that you're seeing perhaps in certain windows of the processed inference data divergence that may flag into some missed model performance in your deployment. This can be on scope, input versus output, it can be supervised or unsupervised, and it can still be for classification, regression, etc.
From a pattern perspective, this is slightly different to what we've seen. We still have our deployed model that receives an input data and returns an inference response. But for the outlier detector, what it can do is it can listen through this,
you know, cloud events, eventing infrastructure, which we're not going to cover in much detail. It can actually listen to the same inference data that goes through the model, it can do an assessment of whether the data is in distribution or not, with a set of algorithms that you can try in some of the examples that we actually link using our open source tool,
alibi, you're going to be able to actually test how all of this fits together. And this basically stores whether the data is an outlier or not. So then when you are able to look into the data of your input request, you're able to know whether that request had a particular outlier, or you can set up more specific alerts that are relevant to them.
And specific to this, we also have drift detection, and drift detection is slightly different to outlier. It still listens to the inference data, but instead of acting on each data point, it actually acts on a tumbling or sliding window of data.
And it identifies whether there is a drift that is within that specific set of window. And if there is, again, it sends the metrics, so that you can configure your relevant alerts and notify respective individuals. Again, you have a broad number of examples.
You know, with our Selvin Core and alibi tools, you know, we have a lot of examples and contributors. If you find an algorithm that is not implemented, please let us know, and we'll be able to also have a look at it. There is an extra note where I'm not going to be delving into much
detail, but adversarial detectors is also a key component, which we also have adopted. This is basically to be able to identify potential adversarial attacks, and more specifically, adversarial examples. This is basically modifications of input data that are added statistical noise that end up predicting
something that a malicious stakeholder may want to, right? So this is often used in the self driverless car where you can have a stop sign with some statistical noise that could cause issues. And, of course, there is an architectural pattern for the adversarial detectors that is also slightly different. We can try all of these
things, you know, in the open source repo. And, you know, it may be worth an extra note that it doesn't really stop there. As you would know with most programmers, we love abstractions, and we love adding abstractions on top of our abstractions. And similar to these patterns, you can actually have ensembles on top of your architectures.
Similar to how I was saying that you can have an outlier detector acting upon a model, you can also have an outlier detector acting upon a metric server. So maybe you can actually detect things like drift on your accuracy, drift on your precision, drift on your latency on your request per second. So you can actually have much more complex components. And that's why
it's important to, you know, introduce the management infrastructure around this, right? Right now, we actually bring up the machine learning model. It's a tiny, tiny thing inside of somewhere across these hundreds of microservices. And it's important because of those reasons.
So with that said, I've covered most of the key things. Actually, I covered all of the key things. We delved into the motivations for machine learning monitoring, the principles for efficient monitoring, some of the core patterns that you can adopt in your production infrastructure, and a hands-on example that you can try yourself, and actually four or five
different hands-on examples that we covered that you can try yourself as Jupyter notebooks. And again, so the slides are in this link, bit.ly slash realtime ML. You can find all the links as well there. And with that, thank you very much. And if there are any questions, I'm more than happy to take them. Otherwise, please feel free to send them over through Twitter or email.
So with that, I'll pause there. And thank you very much. And thank you very much to the organizers.