Azure Machine Learning: From Design to Integration
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 133 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/48782 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
00:00
Software developerDisintegrationMachine learningDataflowComputing platformSQL ServerIndependence (probability theory)Expert systemComputer fontStatisticsComputerObservational studySystem programmingNumberPredictionExecution unitMachine learningLebesgue integrationSoftware developerCartesian coordinate systemoutputPredictabilityAlgorithmWave packetMultiplication signRepresentation (politics)Virtual machineState of matterProjective planeComputing platformFunction (mathematics)Game theoryNumberDescriptive statisticsTerm (mathematics)Source codeCASE <Informatik>Object (grammar)Machine codeVariable (mathematics)Pattern languageService (economics)SoftwareDisk read-and-write headField extensionFunctional (mathematics)Water vaporMeasurementGoodness of fitData analysisConstructor (object-oriented programming)Focus (optics)StatisticsObservational studyLine (geometry)Perfect groupComputer scienceAnalytic setFile format1 (number)Formal languageWeightPhysical systemEndliche ModelltheorieData storage deviceData miningDataflowInformation technology consultingPhase transitionProduct (business)Point (geometry)Time seriesPresentation of a groupField (computer science)NeuroinformatikType theoryEvent horizonSquare numberComputer animation
05:31
Machine learningSoftware developerComplex (psychology)SoftwareComputer programmingLogicFunctional (mathematics)Error messageKolmogorov complexityMathematicsTorusChaos (cosmogony)Endliche ModelltheorieWeb browserPredictionPoint cloudCivil engineeringData miningVirtual machineSequelSpacetimeService (economics)PredictabilityMachine learningError messageSoftwareComputer hardwareProduct (business)Software developerRelational databaseDifferent (Kate Ryan album)Web serviceCellular automatonInternetworkingData storage deviceCuboidBlogComputer scienceVideo gameProjective planeData warehouseData analysisCycle (graph theory)Rule of inferenceInformation technology consultingThumbnailEndliche ModelltheorieMathematicsCartesian coordinate systemPoint (geometry)WebsiteDemo (music)Vapor barrierStack (abstract data type)Disk read-and-write headExpected valueSet (mathematics)NumberEntire functionOperator (mathematics)Self-organizationConnected spaceAnalytic setWeb browserPoint cloudAlgorithmSoftware frameworkQuery languageRing (mathematics)Power (physics)Integrated development environmentLevel (video gaming)Web 2.0Server (computing)NeuroinformatikStatisticsCloud computing
10:57
Software developerVirtual machineDataflowMachine learningPredictionCompilation albumWebsiteElectronic visual displaySource codeVolumeElectric currentDisintegrationFile formatDifferent (Kate Ryan album)Relational databaseData storage deviceMachine learningMetadataData center2 (number)Computer clusterVirtual machinePhysical systemOrder (biology)Phase transitionPredictabilityTesselationLine (geometry)WebsiteProduct (business)DataflowEndliche ModelltheorieFile formatVolume (thermodynamics)Cellular automatonSpacetimePresentation of a groupElectric generatorElectronic visual displaySource codeDot productCustomer relationship managementComplete metric spaceComputer animation
13:19
Software developerSimulationSineExecution unitBloch waveRange (statistics)Table (information)Internet service providerQuery languagePower (physics)FreewareSinguläres IntegralGenderPhysical lawElectronic mailing listNumberLebesgue integrationWebsiteDrop (liquid)IdentifiabilityComputer animation
14:53
FlagMachine learningDataflowSoftware developerSinguläres IntegralContinuous functionRange (statistics)Process modelingGenderDatabase normalizationGoodness of fitMorley's categoricity theoremCartesian coordinate systemCASE <Informatik>RamificationDemo (music)AlgorithmMathematical analysisWebsiteSoftwareFlagAnalytic continuationNumberSoftware developerTesselationRight angleProjective planeSoftware bugEntire functionSampling (statistics)Software maintenanceMultiplication signElectronic mailing listNormal (geometry)Attribute grammarProduct (business)Computer animation
18:03
Software developerGroup actionPredictabilitySinguläres IntegralLevel (video gaming)Well-formed formulaFormal languageMereologyTouchscreenAutomatic differentiationElectric generatorLibrary (computing)MathematicsDistribution (mathematics)Type theoryPower (physics)Computer animation
19:38
Software developerType theoryLevel (video gaming)Distribution (mathematics)Query languageResultantWeb browserRight angleSpacetimeComputer animation
20:33
Software developerLocal ringVirtual machineAlgorithmDataflowMachine learningIterationLevel (video gaming)Data modeloutputPerformance appraisalParameter (computer programming)Web browserIntegrated development environmentSpacetimeShared memorySet (mathematics)Machine learningPoint cloudoutputSocial classVirtual machinePoint (geometry)Selectivity (electronic)Endliche ModelltheoriePredictabilityFunction (mathematics)Process (computing)Object (grammar)Phase transitionAlgorithmParameter (computer programming)Mobile appIterationComputer animationSource code
22:05
Software developerConvex hullSet (mathematics)Projective planeTemplate (C++)Type theoryPoint (geometry)ArmSeries (mathematics)WhiteboardReverse engineeringObject (grammar)PredictabilityModule (mathematics)outputSource codeFunction (mathematics)Right angleComputer animation
23:25
Software developerDatabaseTable (information)WindowSource codeRight anglePoint cloudGenderPersonal identification number (Denmark)Process (computing)Point (geometry)Information securityRow (database)Axiom of choiceImage resolutionWave packetPresentation of a groupValidity (statistics)TouchscreenHistogramVirtual machineMereologyNumberSummierbarkeitStatisticsVisualization (computer graphics)Information privacyProjective planeSinguläres IntegralMultiplication signVolume (thermodynamics)Computer animation
26:09
Software developerWechselseitige InformationConvex hullPredictionSocial classSinguläres IntegralConfiguration spaceProjective planeNumberDrop (liquid)Electric generatorDrag (physics)Module (mathematics)WordComputer animation
27:06
Social classPredictionSoftware developerAsynchronous Transfer ModeSubstitute goodSystem callDefault (computer science)Virtual machinePerfect groupDiffuser (automotive)Process (computing)Price indexPattern languageRight angleSelectivity (electronic)Different (Kate Ryan album)Fraction (mathematics)Module (mathematics)Heegaard splittingWave packetMereologyMachine learningAlgorithmElectronic mailing listForestDecision theoryGroup actionEndliche ModelltheorieTask (computing)CurveRandomizationData modelSocial classSoftware testingMultiplicationDrag (physics)Point (geometry)Drop (liquid)Computer animation
30:53
Software developerSocial classPrediction5 (number)Convex hullSoftware testingMachine learningAlgorithmPairwise comparisonEndliche ModelltheorieWave packetSelectivity (electronic)Black boxStatisticsMultiplication signParameter (computer programming)ResultantDefault (computer science)Artificial neural networkModule (mathematics)Latent heatRight angleOnline helpComputer animation
32:56
Software developerOnline helpParameter (computer programming)MereologyArtificial neural networkMathematicsEndliche ModelltheoriePoint cloudProcess (computing)Decision theoryPerformance appraisalRight angleSource codeServer (computing)Wave packetElasticity (physics)Module (mathematics)Data acquisitionHeegaard splittingService (economics)Computer animation
34:24
Software developerMachine learningSalem, IllinoisVirtual machineVirtual machineLink (knot theory)MathematicsRight angleSpacetimePerformance appraisalPhase transitionResultantArtificial neural networkDecision theoryForestGreatest elementComputer animationSource code
35:25
Software developerTorusEndliche ModelltheorieTrailSet (mathematics)ResultantMarginal distributionDecision theoryGreatest elementForestArtificial neural networkMetric systemMultiplication signLine (geometry)Fitness functionComputer animation
37:14
Software developerDataflowMachine learningPredictionSinguläres IntegralLogicData modelModule (mathematics)Fitness functionVirtual machinePredictabilityMoment (mathematics)Web serviceLevel (video gaming)Computer clusterSet (mathematics)Endliche ModelltheorieFunction (mathematics)outputFunctional (mathematics)Web 2.0Wave packetComputer animation
38:18
Convex hullSoftware developerWeb servicePredictabilityEndliche ModelltheorieValidity (statistics)Module (mathematics)Wave packetQuicksortFunction (mathematics)outputProjective planeSinguläres IntegralMereologySpacetimeArtificial neural networkVideo gameWeightConnected spaceProfil (magazine)2 (number)Social classMetadataDirection (geometry)DataflowComputer fileComputer animation
41:13
Software developerPredictionSocial classWechselseitige InformationProjective planeDefault (computer science)System callFunction (mathematics)PredictabilityGreatest elementMoment (mathematics)Process (computing)InternetworkingWeb serviceDatabaseMultiplication signEndliche ModelltheorieSource codeComputer animation
43:36
Service (economics)Configuration spaceDataflowMachine learningDatabase transactionSoftware developerPlastikkarteWeb serviceService (economics)Operator (mathematics)SpacetimeAdditionVirtual machineStreaming mediaArmEndliche ModelltheorieDemo (music)NumberPredictabilityScalabilitySingle-precision floating-point formatNeuroinformatikScaling (geometry)Multiplication signComputer animation
44:52
Software developerDataflowMachine learningPredictionSystem callFunction (mathematics)Computer programmingInternetworkingConnectivity (graph theory)Address spaceFormal languageScalabilityConfiguration spaceProcess (computing)PredictabilityBit rate2 (number)Service (economics)Level (video gaming)Software developerNeuroinformatikWeb serviceRight angleReal-time operating systemMereologyDependent and independent variablesCartesian coordinate systemMobile appConnected spaceEmailLebesgue integrationStapeldateiVolume (thermodynamics)Address spaceResultantCustomer relationship managementKey (cryptography)InternetworkingBlogData storage devicePhysical systemTraffic reportingSource codeComputer animation
46:55
Machine learningDataflowFunction (mathematics)PredictionSystem callInternetworkingConnectivity (graph theory)Address spaceSoftware developerDependent and independent variablesSocial classComputer programmingFormal languageVolumeWeightApplication service providerWeb 2.0Web serviceDependent and independent variablesRight angleSoftware testingoutputFunctional (mathematics)AuthenticationDemo (music)CASE <Informatik>Key (cryptography)Service (economics)Cartesian coordinate systemSurfaceWeb applicationComputer animation
47:55
PredictionSoftware developerWeb applicationApplication service providerVisualization (computer graphics)Profil (magazine)State of matterService (economics)Cartesian coordinate systemGame controllerSocial classComputer-assisted translationMedical imagingSoftware developerObject (grammar)Point (geometry)Instance (computer science)AuthorizationCustomer relationship managementKey (cryptography)Dependent and independent variablesComputer animation
49:11
Dependent and independent variablesSoftware developerSample (statistics)outputParameter (computer programming)Function (mathematics)CodeService (economics)AuthorizationEmailVideo game consoleDependent and independent variablesDemo (music)Key (cryptography)Machine codeSoftware developerProjective planeSampling (statistics)outputApplication service providerFunction (mathematics)Cartesian coordinate systemSocial classGoogolType theoryComputer animation
50:44
Software developerDemo (music)InternetworkingNamespaceProjective planeMachine codeConnected spaceSampling (statistics)Assembly languageMachine learningCausalitySoftwarePerspective (visual)Source codeComputer animation
53:27
Software developerCloud computingBackupRow (database)Performance appraisalComputer programmingPoint cloudComputer animation
55:01
Software developerResultantWeb pageInternetworkingProduct (business)Performance appraisalSet (mathematics)Connected spaceComputer animation
56:07
Software developerMachine learningWeb pageClosed setDependent and independent variablesNamespaceSocial classTrailTouch typingWhiteboardSource codeComputer animation
57:05
Software developerCartesian coordinate systemVideo game consoleSocial classAuthenticationString (computer science)Type theoryProfil (magazine)MappingFunctional (mathematics)Key (cryptography)CASE <Informatik>outputWeb applicationWeb pageTesselationPredictabilityFluid staticsHand fanFunction (mathematics)System callSource codeComputer animation
58:41
Software developerDependent and independent variablesSample (statistics)Point (geometry)Dependent and independent variablesMachine codeRevision controlQuicksortConfiguration spaceServer (computing)Web 2.0Service (economics)Object (grammar)Operator (mathematics)Sampling (statistics)ResultantComputer animation
01:00:08
Software developerDependent and independent variablesService (economics)Default (computer science)Multiplication signRow (database)Data structureSocial classResultantData conversionRoutingDemo (music)Object (grammar)Root1 (number)Computer animation
01:02:57
Dependent and independent variablesSoftware developerSystem callRow (database)PredictabilitySoftware testingString (computer science)Profil (magazine)Medical imagingObject (grammar)Web pageLogicUniform resource locatorComputer animationSource code
01:04:03
Social classPredictionCommutative propertySoftware developerData managementMetropolitan area networkEndliche ModelltheorieDegree (graph theory)Level (video gaming)Bit rateMountain passHypermediaComputer animation
01:05:25
Software developerSocial classPredictionCommutative propertySingle-precision floating-point formatClosed setGenderDifferent (Kate Ryan album)Revision control2 (number)Real-time operating systemCartesian coordinate systemAutomatic differentiationIntegrated development environmentScalabilityProduct (business)Point (geometry)Dependent and independent variablesPlastikkarteScaling (geometry)Virtual machineRight angleComputer animation
01:06:44
ComputerSoftware developerVirtual machineMachine learningWebsiteField (computer science)Message passingSystem programmingObservational studyStatisticsSoftwareComputer hardwareDrag (physics)Machine codeAlgorithmSoftware maintenanceRippingWeb serviceVirtual machineData storage deviceAxiom of choiceRight angleMathematical analysisCharacteristic polynomialIntegrated development environmentDrop (liquid)Software developerConnected spaceDiscounts and allowancesProduct (business)Web browserComputing platformMultiplication signSampling (statistics)Point cloudComputer scienceMedical imagingGoodness of fitTemplate (C++)Social classField (computer science)Lebesgue integrationConfiguration spaceResultantBlack boxDifferent (Kate Ryan album)Scaling (geometry)Confidence intervalAttribute grammarFile formatComplex (psychology)Level (video gaming)AreaKey (cryptography)Pattern recognitionSoftware testingBlogSign (mathematics)VirtualizationStatisticsCartesian coordinate systemWeb 2.0Observational studySequelConstructor (object-oriented programming)Internet der DingeContent (media)Physical systemElasticity (physics)VideoconferencingMessage passingDrag (physics)Web applicationSoftwareMachine learningMemory managementWeb pageSpacetimeDigitizingSource codeModule (mathematics)Process (computing)Variety (linguistics)Scripting languagePredictabilityAnalytic setMappingApplication service providerField extensionForm (programming)FAQData mining
Transcript: English(auto-generated)
00:08
Good afternoon and welcome. Azure machine learning is the topic of this session. My name is Peter Myers and I'm going to give you a 60-minute introduction to what machine learning is and why this might be relevant for you as a developer.
00:23
From design through to integration, let me briefly introduce myself. All the way from Melbourne, Australia, I'm a consultant, I'm an instructor, I'm a presenter. Bread and butter of what I do is data. Data storage, data analytics. I keep my head above water when it comes to being a developer and I keep my skills in .NET current,
00:42
but really what I'm focusing on here is what you can do with data to enhance an application experience. Is there anybody in the room this afternoon that is already working with machine learning or data mining? So, one. And that's perfect because we make no assumption in this session.
01:01
I'm going to work through what machine learning is. We'll introduce you to what Microsoft are delivering as a cloud-based machine learning platform and then we'll demonstrate from design through to software integration how predictive analytics may be embedded into an application. Finally, I'll wrap up with inspiration for you on what might be relevant to you and your data and your application requirements and what might work for you in a machine learning scenario.
01:25
So, let's begin right back in time. Historically, businesses have always had this need to know what could be. Whether it's time series, prediction, what's going to happen in the future, if it's to understand when our equipment may break down, let's be proactive rather than reactive.
01:45
But I'd ask you now, would you think it's a sensible thing to do to consult the services of a clairvoyant? This is an old clip art that I got years ago from PowerPoint. Probably not is the answer I'd expect from you. More recently, we see fantastic accuracy coming from an octopus
02:02
when it came to the prediction of the outcome of World Cup games. But I'd ask you whether that's an appropriate way for your business also to predict what could be. Well, they're interesting topics but probably not in scope for a conference such as this.
02:20
We're here to talk about a more scientific method when it comes to prediction. So, beginning with the description of what machine learning is, I went to the trusty source of Wikipedia and let me read this out. It's a subfield of computer science and statistics that deals with the construction and study of systems that can learn from data rather than follow only explicitly programmed instructions.
02:46
Let's explain through two examples. Well, I need to add two numbers together. And as developers, we recognize the need for a function that would take two inputs, number one. Number two, using a language like C sharp, we can easily solve this problem.
03:02
We're not here to talk about those types of problems. The more interesting question that we'd like to solve in this session through demonstration is the need to predict customer profitability. And we recognize that we require a function that would take as inputs demographic details about a customer and then we'd like as output some profit measure.
03:25
And so, as a developer, you may well be tempted to start this project by developing this method. And I ask you now, what would be the lines of code that you would write that would output a reasonably accurate prediction for a customer?
03:42
Well, if you attempt to do it this way, I'm sorry but you're on the wrong path. Really, the approach will be to take some good quality representative data that describe your customers and the profit that they've generated. And we can use algorithms that can detect relationships and patterns between these variables and they will learn what the matching output or in this case predicted profit could be.
04:07
And so, that's really the focus of this session. How can we develop and deploy these services? And then how can we integrate them into an application experience? So, the machine learning flow, very, very briefly, like with any project,
04:21
you want to start by defining very clearly what your objectives are. Typically, with machine learning, what you're producing will have a very narrow and specific purpose. Having locked that in, you're then going to hunt around for some good quality data that is going to be available for you to hand off for training purposes.
04:41
And the collection of data is just the beginning. Significant amounts of time will be spent preparing that data to ensure that it is in the best possible format and cleanliness state ready for machine learning activities. You may well have heard of the garbage in, garbage out. If you're going to pass garbage into a machine learning project,
05:01
then don't expect the predictions to be much more than garbage themselves. Now, the real machine learning happens in the next two phases. We train models, note the plural there, not just one. We could spend literally weeks, if not months, training a number of models and evaluating them to determine in terms of accuracy, reliability and usefulness,
05:22
do they serve the purpose and do they best meet the objective that we have in this project? Having eventually arrived at the training of a good quality model, you can then publish this as a web service. And then you want to monitor this to ensure that it behaves and performs as expected.
05:42
Now, the next thing will be to hand this off to some software developer and say integrate this into an application. So that sets up the framework for this 60-minute session. Before we move on, I'd like to talk about the roles that are typically involved in a machine learning project. We have no data scientists in the room.
06:02
We had one person that admitted that they had some machine learning background. Typically, when we end up with one or two, I then ask the embarrassing question of how much money do you earn a year? And before they answer, I stop them, of course. But it's noteworthy that these individuals, simply because supply and demand,
06:22
there are very few of them and they're in increasing demand. They're typically highly educated individuals, masters or PhD level with computer science, mathematics or statistics. And these people are there to find value in data. So, machine learning certainly appeals to the data scientist.
06:44
But I'd also like to point out that as a data professional, because I'm no data scientist, but I've had 15 years working in data analytics and data warehousing, someone with my skill set is also going to find that Azure machine learning is approachable. Any background in data will be a good prerequisite
07:02
for a machine learning project. Of course, the software developer, of which I'd assume many of you are, would also be interested in perhaps the entire lifecycle or at least how to integrate a machine learning web service into their application. Now, traditionally, machine learning has a high price tag attached to it.
07:24
I've just mentioned the human resources alone are an expensive resource. And then we talk about the hardware and the software and it stacks up very quickly. To this end, we'll find that it's in a barrier to entry that small, medium-sized businesses usually don't invest in machine learning because it's beyond their skill set and their budgets.
07:43
Usually, the large organizations will have teams of machine learning people. So, it's expensive. Your data may be isolated. There could be many tools required to produce a single solution. There's not one single tool that would enable you to manage that entire lifecycle that I've just described.
08:01
And obviously, it can become quite complex. What this means is for those that don't invest in machine learning technologies, they're often left consulting clairvoyance, octopus, or simply guessing using rules of thumb or using trial and error, which means lost opportunities or expensive mistakes.
08:23
What are Microsoft doing in this space? Well, interestingly, Microsoft have had data mining available through SQL Server for many, many years. Has anybody worked with SQL Server's data mining capabilities? Noteworthy was Microsoft first introduced data mining in the 2000 release of SQL Server
08:41
and it was the very first relational database product that provided data mining out of the box. It wasn't until the 2005 release that it became much more robust and complete with some nine algorithms to work with different data mining scenarios. Now, this is an on-premises way of solving a predictive analytics problem.
09:01
But where Microsoft are heading now is cloud first. And so they've introduced an entire new cloud-based service named Azure Machine Learning, enabling powerful cloud-based predictive analytics, professionals, not just machine learning data scientists, but data professionals, even software developers, will find that this is now approachable
09:22
and certainly affordable for producing, deploying, and maintaining predictive analytics services. Armed with nothing but a web browser, so there is no need, in fact, there is no capability to manage your own hardware or software. It's all cloud-based. What you will require is a supported web browser,
09:42
whether it's IE, Edge, Chrome, Safari, or Firefox, and, of course, internet connectivity to work with the service in the cloud. The data that you machine learn against will need to be in the cloud or accessible from the cloud. So that means for the on-premises scenarios, you're going to have to push it up, whether it's Azure Blob Store,
10:01
whether it's Azure SQL databases, Azure SQL Data Warehouse, or in big data scenarios, you can even issue hive queries against big data stores for machine learning purposes. How it all works, I'm about to demonstrate. The Azure portal is required to provision an Azure machine learning workspace.
10:22
So it might be your Azure operations team that would set this up. They would then invite individuals within your organization to participate within a workspace. Those individuals can launch ML Studio in the web browser, and they may use this environment to work with data and to define and publish experiments
10:41
to become web services. At this point, we can hand it off to a software developer that may then interact through two methods with the API service. So let's begin with demo number one, wearing the hat of the Azure operations team. I switch across to my Azure portal,
11:01
and I create a new workspace. So I'm going to call this NDC London. That's a unique name. There are only three data centers that support Azure machine learning. I will choose the closest, and then there's also the requirement
11:22
for a storage account because all of the data and metadata defined within your workspace will be stored in that blob store. This is going to take about 40 seconds. We'll come back when it's done.
11:40
With that set up and me having access to that workspace, I'm now going to demonstrate this flow from beginning to end. Starting with defining the objective, it's very, very important that you have a clear understanding of what you're there to achieve. So recall that the problem that I had was I needed to predict customer profitability,
12:02
dot, dot, dot. More specifically, to deliver targeted display advertisements on the company e-commerce website in order to present relevant product suggestions with the aim of encouraging more sales that will then increase, hopefully, the bottom line.
12:20
With that clear objective, then, we move to the next phase. We look for data, and we need sufficient volumes of current, clean, and complete data. This could come from anywhere, typically from internal sources. I'd be looking at my CRM system to get rich customer detail. I'd be looking at sales systems to understand what people have purchased,
12:41
offsetting the cost of those to determine the profit generated. We could look at external sources, as well. If I was looking at demand forecasting and pricing, I might look at weather, you know, by date and temperature. Let's join that with the sales to learn whether there's a relationship
13:01
between temperature and the sales volumes. The richer you can make your data, the more interesting and perhaps the more accurate the models are that you can generate. All right, so, the next demonstration will be to look at the data that I have collected.
13:21
And keeping this very simple because it's a short, 60-minute session, I have collected 10,000 customer details in a CSV format. I'm opening it up here in Excel. Let me just zoom that up so it's a little clearer. And then I'm going to select the entire range and make it into an Excel table.
13:43
Has anyone here installed Power Query? Free download from Microsoft. Provides extraordinary capability to acquire, transform, filter data. So I'm going to load that table into Power Query.
14:00
And a quick show here that we have details like the unique identifier, the first and last name, the age, marital status, gender. Look at the yearly income, which we see is discretized into buckets of 25,000. Probably because when they registered with your e-commerce website,
14:22
you had a drop-down list. And these were the drop-down values. We also have details about the number of children they have and how many children are still living at home, their education, occupation, home ownership, how far do they travel. And then when we arrive at the last column,
14:41
this is the integration of data coming from our sales history. And this would represent the last 12 months of sales offset by cost. Moving on then, are we happy with the quality of this data? Well, we can't answer that
15:01
until we thoroughly interrogate what we have available. So often significant amounts of efforts will be required to prepare and optimize this data for the algorithms and the purpose that you assign to them. We would want to transform and cleanse, reduce or reformat, isolate or flag abnormal data.
15:21
Now, if we note that there is abnormal data and we flag it, what are we likely to do? Correct it if we can. Oftentimes you can't. Delete it. Okay. And that's a real shame because when you start removing data, you remove depth available for analysis.
15:42
Okay, so it will actually depend. I'd want to go back to the objective and understand what I'm designing here. If it was a fraud detection project, I'm looking for anomalies, we would keep them in. And we may even oversample them and multiply them in the data so they stand out.
16:02
And then the algorithms can understand how or what the circumstances are that produce those anomalies. In the case of customers, we might be a little surprised to learn that we have a lot of Afghanistan accountants. Why would this be?
16:21
And you recognize that this probably isn't true. Come on, you software developers, what's happened here? First in the list. So, as developers, you think, let's just make things easy. Let's select the first item. And of course, the customer registering on your website is just looking to check out and buy whatever they've added to the shopping cart.
16:40
So, it's next, next, next. All right, so it's perhaps not the smartest thing. A very, very simple and small thing that makes sense to you perhaps as a UI developer may have significant ramifications to collecting and maintaining quality data. In which case, you may have to eliminate entire customers or perhaps even eliminate entire attributes,
17:01
remove nationality because we've just got too many Afghanastanis and we know that that's just not correct. All right, so take good care when developing applications today to ensure that you collect good quality rich data and that you also maintain it. You might notice today the Amazons of this world, they'll often redirect you back to review your details
17:23
and this is actually a good practice. People change their income, they change their profession. So, using data that was 15 years ago perhaps isn't the best story. If that's all you've got, that's all you've got but do what you can to maintain good quality data.
17:42
We might also substitute missing values. We'll come back to this one in a minute in demo. If we've got missing values, what would we do? Categorization of continuous numeric values like income into 0 to 25,000, 25,000 to 50,000 and so on.
18:02
So, in demonstration, that last column that provides the to the cent profit generated isn't of so much interest to me. I'm not interested in the prediction to the cent. I'm more interested in a generalized classification that in targeted advertising what you're likely to do
18:22
is have libraries of ads that match perhaps levels of profit. That this is a high profit, medium, low, very low. So, I'm more interested in a classification rather than to the cent and dollar, what the profit would be. So, as part of my preparation, I'm going to add in a calculated column
18:43
and this column will be called profit generated label. I know that might be a little small. There is no zoom available on this screen. Using M, which is the formula language for Power Query, I'm simply going to say that if the profit generated
19:01
is less than 100, then I'm going to classify you as a very low profit earner. Else, if that profit generated is less than 500, you get classified as low. The next level would be if you're less than 1,500, then you're medium and anything left over is going to be a high profit generation.
19:26
That adds in a new column and if I do a simple transformation to group by this, we can see if the 10,000 customers that we've got this type of distribution about,
19:42
what do we got, 1,300 are very low and at the higher end, about 1,700 are producing a high level of profit per year. That is the preparation, very simplistic for demonstration purposes. I'm going to close and load this query result back to an Excel worksheet
20:00
and then I'm going to do a file save as and save this as my prepared CSV data.
20:22
Now, that workspace that was provisioned earlier is available here and the Azure Ops team would have then managed access to ensure that I have rights and then I can go ahead and sign into ML Studio here in the web browser. Within the workspace, as a collaborative environment,
20:42
all of my colleagues working in the same workspace, we can share data and experiments and so here in the studio, I'm simply going to upload a data set. Remember, the data that is required for machine learning must be in the cloud or accessible from the cloud and I push that CSV up.
21:13
Now we're ready to do some machine learning. We're almost. The next two phases go hand in hand. It's an iterative process of experimentation
21:22
looking to produce the optimal model. This involves a selection of a machine learning app and an algorithm defining inputs, defining what the output is. The inputs will be customer demographics. The output will be a prediction of a profit class label. Very low, low, medium, or high.
21:43
We could also tune the algorithm so there are parameters that enable you to tweak what it's doing and ultimately at the end of this process, you're looking to select a model that is the most accurate, reliable, and useful for your objective.
22:04
So producing the experiment. That data set has now been uploaded. So now I'm going to go ahead and create a new experiment and I'll point out to you here that Microsoft are being very, very clever. They are building up a gallery of templates
22:20
which allow you to learn or to kick start your own project. For example, if I type in fraud, there's a whole series of experiments that come with their own documentation and data sets. So I would strongly suggest that if you've got a project, find a template here that matches that, spend a day reverse engineering it,
22:42
and then look to see what you can take on board that fits within the objectives of your own project. In this demonstration, I'm going to start with a blank experiment. And it may not seem obvious, but the name of the experiment is up here. Predict customer profit class.
23:07
All right, under save data sets, drag and drop, there is my prepared data. I'll also point out to you then that if I come to the input and output,
23:20
there's also a reader module. And by using the reader module, this is where you can access data from other sources. Uploading a CSV from the desktop is probably okay for maybe a million rows, but if you're dealing with much, much more data, you're likely to source it from these locations.
23:44
All right, so we can also do data preparation here in the experiment, so I will demonstrate that for you. It may well be that you don't have that opportunity to use on-premises tools to prepare your data. So here in a visualization, top left corner,
24:01
we see that we have 10,000 rows and 15 columns. Customer ID, first name, last name. In fact, I'll make a clear point here. I should not have uploaded first name and last name. Why not? Security and privacy, exactly.
24:21
So big concerns with the cloud today is we do not want addresses, names, social security numbers up there. Well, given that we know that first name and last name have probably no relationship to the profit generated, whether you're Mary or Fred, well, gender aside, it's unlikely that they're influential, all right?
24:41
So first thing is I should have eliminated them before uploading them to the cloud. I will remove them as part of the training process now. Column by column, you may select, you may analyze the statistics. There's even a histogram down here. The total children is one that I want to point out because we've got some missing values.
25:02
Out of the 10,000 customers, 10 have null. Bad validation routine going on in the UI. For that volume, I'm happy to assume it's zero. So I'm going to substitute those out. But the next one, which is the number of children at home, we'll see that there's far more missing.
25:21
In fact, almost 50%. And here's an example where you have poor quality data. You have no choice because you can't repair that. But to eliminate it. The other thing to take note of is, which is the right scroll bar here,
25:43
scrolling within scrolling. The screen resolution is not optimal for the presentation. I will have to drop this column as well. I should not pass this into the machine learning process because there's a very clear relationship between the last two columns. So I probably shouldn't have uploaded it either.
26:03
Let me demonstrate to you then how I can eliminate those columns and cleanse the missing values. So there are an enormous number of data transformation modules available to me. And here under manipulation, I'll find that I have the project columns. And this allows me to define
26:23
what the columns will be moving forward. So drag, drop, connect, configure is the common theme when working in this designer. And I'm going to begin with all columns and exclude by name. The customer ID, the first name, the last name, the number of children at home
26:40
because we had just too much missing data and the profit generated. The next thing to do will be to go ahead and clean data. Just by searching for a module that has the word missing, I find that I've got a clean missing data module.
27:03
Drag, drop, connect, configure. So begin with no columns and include by name only the column that is total children. And then the default cleaning mode is to use a substitution value, which by default, very nice for me, is set to zero.
27:27
So job is done. So far, all I've achieved is some extra preparation. Now I'm ready to do machine learning. We need to give consideration to how we're going to evaluate and therefore test the accuracy
27:41
of the models that we develop. And a very standard process will be to take your carefully prepared data and to split it in two. Today, using a ratio of 70-30, the 70% of data randomly selected will be used to train a model. And then the reserved 30% of random data will be used to test the accuracy.
28:02
We know the outcome, but now predict for the 30% and a ratio of how often it gets it right is a perfect indicator of accuracy. So let me now introduce a splitter
28:21
or a split data and drag, drop, connect the cleanse data using a fraction of 0.7. Now, if you double-click a module, it enables annotations sometimes.
28:42
There, 70%. And this is a way that you can document what your modules have been configured to do. Now, the machine learning part. Let me collapse these.
29:02
Under the machine learning group, under initialize model, there are literally dozens of algorithms and best-in-class algorithms using industry today and developed in-house at Microsoft and often used in Bing and Xbox. So you are dealing with the greatest and the best.
29:23
The problem might be for a beginner is how do I narrow it down? There are dozens of algorithms. Now, the task that I have is in fact a classification. How do we classify a customer into one of those four different profit classes? But when I expand the classification group that I see that there's perhaps 14 or 15 algorithms.
29:43
Can anybody see a pattern here that would help me quickly eliminate many of them? I.e., is it multi-class or is it two-class? Multi-class. So very low, one. Low, two.
30:01
Medium, three. High, four. If it was, does this patient have a disease, yes or no? Then that's a two-class classification. Here, it's multi-class. And we only have a selection of four. And at this point, you might still say, well, I have no idea which one I should use. Well, this is experimentation.
30:21
It's part of your learning curve as well. Try them all and then test them and then determine which one is accurate based on the data that you have. So I'm just going to start with the first in the list that happens to be the multi-class decision forest. Using this algorithm then, I'm then going to train a model.
30:41
Drag-drop. Left-hand side wants to know which algorithm. The right-hand side wants to know where the data is coming from. And then I configure it and tell it what our target is. We're looking to predict the profit-generated label.
31:01
Now, we all went to school and we studied subjects and you were assessed. You had tests and assignments and exams. And their purpose was to evaluate how well you had learned. And the same applies here to machine learning. We want to score them to determine how good they are
31:20
at what they've learned to do. And so here, I'm going to introduce in a score model. Having trained the model, we're then going to use it with that 30% of data. And this module will then be responsible for producing statistics that allow us to assess how good it is at what it's been asked to do.
31:43
Your experiment can consist of multiple, and often will, training and scoring modules. So I'm just going to do a copy and paste. And here on the right-hand side, I'm going to do a comparison by introducing
32:03
another module that uses a different algorithm. I'm going to go this time for the neural network. Now, I don't really know what they do.
32:22
You can think of it as black box. But what you are concerned with is that they produce an accurate and reliable result. All right, so my understanding is it works in similar ways to what we understand the brain does. Side by side, in parallel,
32:41
two models will be trained different algorithms. And let me point out that the selection of an algorithm also includes algorithm-specific parameters. The defaults are usually good defaults. But if you want to know more, for every module when you select it, you just click on More Help. And there's the Azure documentation.
33:01
You can read through this, learn from this, and all of those parameters are described down here. And again, as part of your learning curve, well, if you want to know, what happens if I change this from 5 to 10? Let's retrain. Let's see, does it produce a more accurate result?
33:24
The last module, before I then send this off to the cloud for processing, will be to evaluate. Where are you? Evaluate. Evaluate model. Left-hand side with the decision forest, right-hand side with the neural network.
33:41
I'm now done. Very, very simple. Some acquisition of data, some preparation, some splitting, training of two modules, some scoring of both, and now we'll evaluate side by side. I then click Run. Note in the top right corner in draft, but this is the status and also conveys
34:01
the duration of processing in the cloud to do this for you. If I had gigabytes and gigabytes of source data up in the cloud right now, it might be running across parallel servers and scaling across. This is the beauty of the cloud, the elasticity. You pay for what you need when you need it. So while it's doing this for the next minute,
34:24
how much is this going to cost you? Well, Azure Machine Learning pricing. That is the wrong link.
34:40
Azure Machine Learning pricing. Where are you? Ah, here. First of all, for every individual that will have access to the workspace, it's six pounds in this country, six pounds per month. When it comes to the experimentation hour,
35:01
it's 61 pence. All right, so switching back to the experiment, that's running now for one and a half minutes. You do the math. Okay, very, very, very low cost. Okay, it's completed in about one and a half minutes. To complete this phase of the demonstration,
35:21
then I visualize the evaluation. And what this is giving me then is a side-by-side. Let me scroll down to show you. The top will be the left-hand side with the decision forest and the bottom will be the neural network results. To keep things very, very simple, this is the metric of interest.
35:40
The overall accuracy is 86% for the decision forest. The classification metrics down here shows me just how often when it actually was high, it predicted high 78% of the time. That's extraordinarily good, by the way. This is deliberately data set for demonstration.
36:03
But let me just suggest that it got it right 28% of the time. Would that be reasonable? Would that be adding value to you in a predictive scenario? If I had a four-sided coin, if such a thing existed,
36:22
and I flipped it, and so therefore, randomly, we'd end up with 25% for each of them. So if you can achieve better than 25%, you're adding value. And a small margin of 3% could have a magnification impact on the bottom line of what you're doing as a business.
36:40
A small increase in providing more relevant suggestions to your customers usually will trickle down to mean better business results. All right, so 78% absolutely fantastic, but I wouldn't be upset if it came in at 30%. It's better than random, and a guess. All right, so back on track.
37:01
We had 86% for the decision forest, and neural network came in at 88%. So very simplistically, because time doesn't permit, I'm going to make a conclusion that it is this model here that is best fit for purpose.
37:22
That's producing a machine learning experiment. The next stage will be to publish this, and that involves adding another experiment that we term a predictive experiment, because it's going to be published to become a web service. And what you'll see in a moment is that it modifies the design,
37:41
so it's no longer an experiment that is training models. It is an experiment that is scoring inputs. Remember the function that I wanted? Give me customer demographics, and then I want an output of a predictive profit class. That's what the predictive experiment will be configured to do. Now of interest here is that we will publish this
38:01
as a web service, and that you can also publish them to the gallery. Of course, Microsoft will review and approve them, but that gallery is a growing set of resources that the community are also contributing to. Let's take a look at how this works in demonstration. Note that I have selected the model
38:22
that I would like to build the predictive experiment around. And then down here on the command bar, I create a predictive web service. Pay close attention.
38:49
All right, it's done its thing. First of all, notice that we have a second tab here. You still have your original experiment because you may wish to continue evolving it. It's added a new experiment in here
39:02
for predictive purpose only. The dark shaded nodes represent your input and your output. So the input here would be better termed customer profile, and the output would be predicted profit class.
39:21
You still need to do some work here. And let me describe what happened. In fact, when it took those left-hand modules and sort of moved them in, I would have preferred that it actually faded them away because they're not part of this experiment. We had selected the neural network model. The others are just disappearing.
39:41
All right, so it looks as if they're being merged and consolidated into the model over here. They really aren't. What it really is, we have inputs coming in. You still have here the CSV. It's not actually going to retrieve data from your CSV file. It needs to understand the definition of columns.
40:02
So it uses it to define the metadata describing columns within the flow. You'll next see that the project columns is still here. This here represents a transformation. So the scrubbing and cleaning of missing data has been consolidated into a transform
40:21
that is available for reuse within the workspace. I'm not sure why Microsoft decided to do that, but they did. It then applies that transformation, and then here we have a trained model, and there we go. We now have a model available in the workspace, which is the neural network model,
40:40
and then it uses the scoring to score. So, in fact, I'm going to break this connection, and I'm going to connect the input direct to the score. Now, that would imply that we're not applying a transform to it. We're not saying that if the children is missing,
41:03
make it zero. We'll assume that we've got validation happening at the other side. I will come to the project columns, and I will modify it, because unlike the training experiment, we needed the profit-generated label. But in a predictive experiment, we're not going to pass it in.
41:22
We expect it to be passed out. So I now remove it. Once the scoring takes place, it passes it to the output, but I'm going to break it as well, and I'm going to place a project column. And the reason for this is,
41:41
there's only one column that I want to output, and you'll see here that the project columns is going to pass out by default. Nothing. It could pass out all of these columns, but we don't need these passed out. Notice the bottom two represent the predictions that's taken place, the scored label and the probability attached to it.
42:03
Here I'm going to say, it just passed through the scored label, and that's what will be passed to the output. And that is now the completed predictive experiment. I'm going to go ahead and run this. To run this, it's simply going to validate that this is a workable design.
42:23
This will take a moment. This would be a great time if anyone has any questions.
42:47
Okay, so the question is that you're asking for a bulk process that perhaps let's apply this against our database of customers to isolate those that we should send out an expensive, glossy brochure to because we're likely to get a high profit from them.
43:02
You could use this model for that. Okay, but what's a little different from the scenario that I have here is it's a bulk process, and we'll come back to this. I'll answer the question shortly when I talk about the web service methods. All right, it is happy, it is run. I'm going to then deploy this as a web service.
43:21
And as of this moment, almost, we now have a web service on the internet. We are done. Let me just briefly finish with the manage topic.
43:41
So back to the Azure operations team. They might want to monitor this. They might want to know how often is it being used, how much is it costing us, and is it performing adequately given the resources that we've assigned to it. There's also, interestingly, the opportunity to monetize. By publishing your web service to the Azure marketplace,
44:02
this enables the public to discover it and also to get out their credit card and start to pay using your service. And you can make income from it. So think carefully about this, that if you've got valuable data that nobody has access to, you could produce some very cool models that deliver business value to others such that they're prepared to pay for it.
44:22
So additional income streams can be generated through Azure machine learning. So in demo here, back to the Azure operations team, they might manage the web services. So here in the workspace now, there is a deployed web service with a single endpoint for scalability.
44:43
You could add more. Here on the charting, you can then take a look at the compute time, the number of predictions. Why is that important? Because in pricing, you're paying for hourly for processing, and then per 1,000 predictions, you're paying 30 pence.
45:04
So this will give you an idea of what's happening. Now under the configure, and this is very cool, as you scale up, of course, it will start to multiply the rate, compute rate. But at the highest level supported today at 200, Microsoft are guaranteeing under service level agreement
45:23
approximately 5,000 predictions per second. All right, so it can be used in real-time scenarios and can scale as to what you enable it to do. Now we get to the interesting part for you guys as developers.
45:43
We've now got this web service enabling us to integrate predictive capabilities into an application. Now there are in fact two methods with every web service that are deployed. There is the request response service, which delivers low latency, highly scalable responses.
46:01
So real-time integration. Back to the question from this gentleman, that if you wanted to take all of your millions of customers and classify them into those four, you could use the batch execution service. As a high volume asynchronous scoring, it would do it asynchronously, it would drop the result into the Azure Blob store, and then you could then come back,
46:21
integrate that perhaps into your CRM system, and then run reports, filter by high, and let's mail out some expensive brochures to these customers. Now, there are four requirements you have to integrate an application with the service, and that is, of course, the app or the device
46:42
will need internet connectivity. It must be SSL, so there must be able to work with HTTPS. The OData endpoint address and API key, of course, will be required. So did you note that when I published the web service, it directed me to the dashboard for the web service.
47:05
This is where we see the two methods. In fact, for request response, you can click test, and that will show you the inputs required for our function. Great. This is the key. Anybody that has this key can authenticate with the service.
47:24
We're going to work in demo with the request response, and here is the endpoint. So with the endpoint and the key, your application has everything it needs to communicate, authenticate, and request across to the service.
47:40
The last thing that's required is, because the request is a JSON document and the response is a JSON document, your application will need to parse JSON. So let's now see in demonstration how I can integrate the service that I've just deployed with a web app. Switching across to Visual Studio, I've created a very, very simple ASP.NET web application.
48:02
It's simply going to prompt on the left-hand side for details about the customer. Of course, in a real application, the profile would be understood within the state of the application. When I click submit, it's going to go ahead and interact with the service. Here is the method for submit.
48:21
So based that we have filled in every control, I've created a class here to encapsulate the entire profile. So I create an instance of my customer profile, and here I am dragging in all the control values, so I've got a single object now representing the customer profile.
48:42
Here I'm at the point where I need to retrieve the profit class, and this is where I'm going to pick up with my development. Providing that I don't get an error, the text error, then I have some images over here. Very, very simplistically, eight images based on your gender, male or female.
49:02
We have an image for high, medium, low, and very low. So very, very simple. So let's take a look at the documentation. What do we have? For the request response service, there's the endpoint, and the request has headers. We will need to pass in the key
49:24
as the authorization header. And pretty much we need to pass in a JSON document that looks like this. Remember when I named the input in the predictive experiment? There it is by name. There are the columns that it expects to be passed in, and here is where our values will be passed in.
49:42
You can comma separate, and you could be scoring multiple customers. In my demo here, it's a single customer that will be passed in. That's the request. The response, granted we have a status code of 200, is simply a JSON document that looks like this.
50:02
Remember the output that I named? Well, it has a value that consists of values, and ultimately this is the one value that I need to retrieve. The documentation continues to tell me the columns, their types, and also the allowed values
50:21
that a developer would expect from this service. And lastly, to make things really simple, C Sharp, Python, and R code to kick start your development. So I'm going to work with C Sharp, and I'm going to copy this to the clipboard. Notably, this sample code is for a console application,
50:40
so there's going to be some replumbing that will need to happen to integrate it into the ASP.NET project. So over here I'm going to add a class, and I'll just call it Azure Machine Learning. Remove and paste in that sample code.
51:07
From a namespace perspective, let's just copy the namespace from the project here. And then the first thing we'll notice is it cannot resolve some of the namespaces. In fact, it's telling us up here in comment
51:22
that you need to install a NuGet package.
51:43
So take a look at the references here. In fact, we'll retrieve three assemblies, so newtonsoft.json. Is it still thinking about it?
52:04
Attempting to resolve dependencies. It's usually quick.
52:36
That's remarkably slow. My demo cannot proceed without it.
52:45
Any ideas? Should I stop and start it again? I've never seen this problem. I'm not sure that I do have it here.
53:01
Do I? I don't.
53:24
Let me just check that I have internet connectivity. I think that could be the problem.
53:44
Oh, okay. Well, working with the cloud service, there's not a lot I can do. As total backup, I do have a recording, and I can hit play. All right, so let's see if we can resolve this. So technical guys in the back of the room, what do you suggest?
54:06
I have eight more minutes left in this session. And the grand finale awaits. The next thing we do, I mean, Wi-Fi is the other option, I guess.
54:29
Any questions while we're attempting to navigate this roadblock?
54:48
You could do that, but if you're going to work with a smaller data set, that's not going to help you to understand whether you have the right parameters and whether the scoring and evaluation that I do is actually right.
55:01
You might find on a small data set, I think we might be back online, with a small data set, it might actually produce a result, but with a bigger data set, it might produce an entirely different result. So it's not usually a great example or a great thing to do.
55:30
So you can. So one technique is you can work with a smaller data set, and if you work with a smaller data set, then just be very, very careful that when you work with a larger production set that you do some rigorous, rigorous evaluation,
55:43
that it's not producing a different result. Okay. Do we have success? No. Internet, well, connectivity I have,
56:02
but it's not passing through. I'm not getting an internet connection. Well, I don't believe I am, although it looked like this page had reloaded. The pipeline has been stopped. You know, if all else fails,
56:21
let's just close Visual Studio and reopen. Tools. Ah, here we are again.
56:40
Attempting to resolve dependency. Ah, there we go. Beauty. All right, so what it's added is, we're back on track, thank you. Newtonsoft.json, and then it's brought in these two as well. So for our request response requirements, we notice now that it can resolve those namespaces. Let's discard the comments.
57:00
Updated the namespace. The first definition of a class here is in fact to allow us to build up, when we take a look at the JSON document for the request, it wants to define the columns and our values.
57:21
So it's described a class for this. Because this was written for a console application, I'm going to first of all update the class name, but I'm going to remove the main method. Obviously it doesn't make sense in a web application. And instead of using this stub, I'm going to use a public static string,
57:44
predict customer profits. We will pass in a customer profile, and that's the method that I'm looking for. Remember when I used that example of, I need to predict customer profit, we had a function that allowed us to pass in a profile,
58:00
and it outputs in this case a string representing one of those four classifications. Here it is then building up that request. So here is where the column names have been described, and instead of passing in zero for the age, we would just pass in profile.age. And here is the mapping of our profile details into the request.
58:21
Rather than type all that out, I will just use a snippet, and voila, all of the profile details are being mapped to the request inputs. Next, it wants to know what the authentication key is.
58:43
Well, that's here. Typically this should not be copied and pasted into the code, it should be retrieved from a config file, right? Likewise with the endpoint, which has been pasted in for us automatically. And so now we get to the point where we send the request and receive a response.
59:04
This was written to be an asynchronous operation, but here on a web server I'm happy to make it synchronous. This object then contains the response back from the service. Let's remind ourselves about what that response looks like.
59:22
Now it's all documented here, essentially is a JSON document. Assuming that it is a successful status code, let's work with the fact that it may not. I'm going to keep this very, very simple. I'm going to return simply error.
59:41
And let's then focus on what happens when it is successful. So instead of returning it to a variable named result, let's call it a JSON document. And now I need to somehow extract this.
01:00:00
value here. So I like the technique of copying the sample and doing a paste special as JSON classes.
01:00:26
And so essentially that's what that JSON represents in a class structure. Instead of calling it a root object, it's really an RRS response, the request response services
01:00:43
response. And then I'd probably like to make these more internal. Let's bring into scope.
01:01:03
What did I forget? Did I forget something? OK, it'll still work for now. So I should be making this a lower than internal, internal or lower.
01:01:21
It'll still work. This one here?
01:01:40
You want me? Yeah? Yeah, where am I going? Oh, so you want this to be public as well?
01:02:00
I think by default it is. But I know it'll still work, because I've done this demo enough times. So thank you. No, that's great, because sometimes I do miss things. But I'm pretty sure that one's going to be right. The JSON document. So all I need to do now here is create an RRS response. So that's my response equals.
01:02:22
Let's go to JSON converts, deserialize into that object based on the JSON document that I have. And now we can go to that response. And it gives me results. The results gives me a predict profit class.
01:02:43
It has a value. It has values. I want the first column, first row. Pay attention. Results, predicted profit class, value, values. Predicted profit class, value, values.
01:03:02
First row, first column is what I'm looking for. That's how I integrate it. I can now return to my submit logic. This is no longer a to-do. And I simply go to my class.
01:03:22
And there's a static method, predict profit class. Hold on a minute.
01:03:41
And I pass in my profile object. Done. It's going to return a string. Providing it's not error, we're going to display that on the web page. And then we're going to pass to the image URL the correct details to retrieve the right image.
01:04:02
Put to the test, it compiles, it builds. Let's use, for example, a middle aged, married man earning top income, two children both at home, high education with a graduate degree, management,
01:04:23
owns a home, and travels perhaps 5 to 10 kilometers. What did our model learn? When I submit, it gets crossed. All right, classification of high. And the advert that we're likely to show somebody that spends up is something
01:04:41
sexy like this red sports car. Now, would it differ if they were simply professionally occupied with a bachelor's degree, perhaps earning lower income? Well, let's see, what did it learn?
01:05:01
Medium. OK, now it's still sexy. It's red, but instead of four wheels, it has two. Let's take it down a notch. So what if they're clerical, perhaps no children, earning $25,000, maybe they're $39,000.
01:05:20
And I'm just guessing at this stage. And it'll live close. It's manually employed, partial high school, very low income, single, perhaps quite young.
01:05:40
What did it learn? Do you think it differs with gender? Well, of course, we have different ads, and in retail, there are significant differences on gender. And it would happen to be a female. By the way, does that seem slow to you?
01:06:02
Three or four seconds to predict, could you do this real time within your application? Let me just make it clear that I'm working on a free version. So what it means is that it's deliberately throttled. Until you move and you pay, you're not going to get that 5,000 per second at the highest.
01:06:21
So when you see the delays, don't go, oh, hold on, this is too slow. Until you start paying for the production rates, you're going to get an environment that costs you nothing to develop. So I'll point out here that you've got the ability, without even a credit card, to sign up for Azure Machine Learning. But you will not get the scalability or the responsiveness that typical production environments
01:06:43
will want. All right, that was the integration of prediction results into the ASP.NET web application. I'm pretty much on time, so I'm going to quickly wrap up with some inspiration for you guys describing business scenarios. But just before I do, a message for IT professionals,
01:07:00
including software developers, is that often there's a great deal of fear around topics of data mining and machine learning. And simply because it's a misunderstanding, we actually take a look at this and say there are two different disciplines with machine learning. There's the discipline of creating algorithms and working on extremely complex problems.
01:07:22
Facial recognition, for example, is one example. Data scientists are almost always going to be employed in these scenarios. But when it comes to the discipline of applied machine learning, as I've demonstrated, you can easily pick up the experience and the techniques required to experiment, to test accuracy,
01:07:43
and to then be assured with confidence that you're delivering business value. You don't need to know what the algorithm did. You can think of it as a black box, providing you're satisfied that it's producing a reasonably accurate and reliable result. So in this sense, IT professionals, whether data professionals or software developers,
01:08:01
will perhaps, for the first time, find that machine learning becomes an approachable technology, and therefore enabling you to integrate into applications and take them to a newer level. All right, what could you use them for then? Targeted advertising is what I've just described. Churn analysis, understanding the characteristics
01:08:20
of customers that might leave you, simply because it's more expensive to attain a new customer than it is to keep an existing customer. This is why sometimes your telcos will call you, saying, hey, you're about to expire with us. But if we give you a 5% discount, how did they know to call you? Machine learning algorithms detected that you were perhaps borderline.
01:08:42
Image detection and classification, there's good template samples of this. It sounds complex, but if you've got the right data that maps a digital image of handwriting to what the actual digital form of that is, then you can train machine learning to work that out.
01:09:01
Equipment monitoring is a huge area. There's an estimated $600 billion wasted in either maintenance that didn't need to happen or maintenance that didn't happen, therefore production went offline. Internet of things and machine learning come together to enable you to know when is the right time, preemptively, to maintain equipment.
01:09:22
Huge savings to be made here. Recommendation engines like who buys what with what, forecasting, spam filtering, anomaly detection through fraud detection, and so on. In summary, what Microsoft are offering as a cloud-based predictive platform is a portal with workspaces enabling individuals
01:09:41
to collaborate on data experimentation and publication of web services. We have nothing but a web browser, and think of this. I could be boarding a flight in Seattle and flying to New York, and on that five-hour flight, I could connect with the Wi-Fi, and I could be doing some very serious machine learning
01:10:02
while flying through the clouds. We're machine learning in the cloud, all right? And that data could come from a variety of sources, whether it's big data stores, Azure SQL databases, uploaded data, and then we can push off a web service and hand it over to the software developer
01:10:20
with the documentation and say, integrate this into an application, whether it's a mobile device, a dashboard, a web app. So analytics today are taking on a whole new face, faster-toward solution, availability of very, very advanced, robust algorithms, elasticity, pay-as-you-go-for-what-you-need
01:10:41
in the cloud, and scale as you need it. In summary, I hope that you've learned that machine learning is a subfield of computer science and statistics that deals with the construction and study of systems that will learn from your data. Key attributes of Azure machine learning, fully managed. In fact, there's no possibility for you
01:11:01
to install any software anywhere. It's all cloud-based. Using a web browser of choice, you can integrate in their development environment through drag, drop, connect, configure, best-in-class algorithms. R is built in. If there's anyone working with R and they've got investments made in this, you can copy and paste,
01:11:22
and therefore, you can deploy R scripts as web services very, very simplistically and quickly. Big takeaway, I think, is that machine learning is now approachable to you guys as software developers. Wrapping up heaps of resources online, the FAQ is actually excellent.
01:11:42
Most common questions that people have have been built into a comprehensive FAQ page. The pricing is a great place to look at as well, even without a credit card, you can sign up and use Azure machine learning for free. Providing your experiments don't have more than 100 modules, don't take more than an hour to process,
01:12:00
are less than 10 gigabytes in data, then you can do this at no cost and start learning from the templates available in the gallery today. There's a blog, there's a past data science virtual chapter, couple of videos that I've done, but very much the same content that I've delivered to you today. There's a book and probably more available as well.
01:12:22
Now, if you're not thrilled by any of the scientific methods that I've talked about and demonstrated, Paul the octopus, he has his own page there at Wikipedia and perhaps you could consult him for your predictive needs. All right, we've arrived at the end of the session, I will thank you very much for your time, attendance and interest and I hope this inspires you
01:12:41
to look at the predictive capabilities with Azure machine learning. Thank you very much. Thank you.
Recommendations
Series of 2 media