We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Python Data Science with VS Code and Azure

00:00

Formal Metadata

Title
Python Data Science with VS Code and Azure
Title of Series
Number of Parts
115
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Learn how Native Notebooks in VS Code can supercharge your data science workflow and how to follow up your deployment of machine learning models using the Azure Machine Learning service!
19
Thumbnail
43:26
Computer programmingControl flowData managementVideoconferencingMachine codeInstance (computer science)Software repositoryComputer configurationMoment (mathematics)MathematicsComputer fileSign (mathematics)View (database)Connected spaceMessage passingKernel (computing)Different (Kate Ryan album)Cellular automatonLevel (video gaming)Right angleWindowBuffer overflowMultiplication signCloningMetadataRepository (publishing)Uniform resource locatorDifferenz <Mathematik>Point cloudService (economics)BitGUI widgetWave packet1 (number)Function (mathematics)Electronic mailing listPower (physics)Parameter (computer programming)Web pageSoftware developerTraffic reportingContext awarenessFormal languagePay televisionMereologyMorley's categoricity theoremExtension (kinesiology)TelecommunicationFile viewerComputer iconPresentation of a groupUsabilityClassical physicsFilter <Stochastik>Endliche ModelltheoriePoint (geometry)Group actionCategory of beingInformationMetric systemSet (mathematics)Streaming mediaInteractive televisionBlogState of matterMedical imagingRemote procedure callOperator (mathematics)Radical (chemistry)CASE <Informatik>Configuration spaceNeuroinformatikOrder (biology)Scripting languageLaptopNetwork topologyModel theoryVariable (mathematics)Demo (music)Virtual machineWeb 2.0Letterpress printingGame controllerINTEGRALGraphics processing unitSheaf (mathematics)Software testingOpen setArtificial neural networkIntegrated development environmentFrame problemNumberRevision controlType theoryMachine learningStatement (computer science)TrailServer (computing)Computer clusterReal-time operating systemWorkstation <Musikinstrument>LoginPoisson-KlammerGraph coloringLimit (category theory)DampingImplementationFirst-order logicFile formatPointer (computer programming)Semiconductor memorySocket-SchnittstelleShared memoryInformation securityProduct (business)PlanningRow (database)Real numberLine (geometry)Decision theoryTouchscreenSlide ruleCuboidDefault (computer science)FeedbackStability theoryTwitterCountingMenu (computing)Connectivity (graph theory)Arithmetic progressionoutputText editorObject (grammar)System administratorHeegaard splittingSelectivity (electronic)Data storage deviceThomas BayesEquivalence relationComplex (psychology)Social classWeightAnalytic continuationShift operatorComputer fontSoftware development kitKey (cryptography)Greatest elementCore dumpLink (knot theory)EmailIterationValidity (statistics)Visualization (computer graphics)Library (computing)ForestTable (information)Content (media)Correspondence (mathematics)Latent heatMeeting/Interview
LaptopComputer programmingMachine codeData managementMachine learningExtension (kinesiology)Demo (music)Meeting/Interview
Extension (kinesiology)Machine codeComputer clusterDisintegrationLaptopShared memorySlide ruleMachine codeLaptopDefault (computer science)CASE <Informatik>FeedbackImplementationTwitterFirst-order logicBookmark (World Wide Web)INTEGRALExtension (kinesiology)Demo (music)Computer file1 (number)Structural loadPoisson-KlammerGraph coloringMultiplication signComputer animation
GEDCOMGamma functionEntropie <Informationstheorie>Variable (mathematics)Read-only memoryCorrelation and dependenceSample (statistics)ExplosionLaptopMachine codeVariable (mathematics)Extension (kinesiology)Range (statistics)Letterpress printingMultiplication signFrame problemNumberState of matterStatement (computer science)BitTrailFile viewerComputer iconView (database)Profil (magazine)Machine learningFilter <Stochastik>Function (mathematics)Parameter (computer programming)GUI widgetServer (computing)
ArmGamma functionWorld Wide Web ConsortiumSoftware testingWave packetPointer (computer programming)Game theoryMaxima and minimaHarmonic analysisCellular automatonHeegaard splittingMorley's categoricity theoremWave packetShift operatorSelectivity (electronic)Endliche ModelltheorieLaptopNetwork topologySoftware testingArtificial neural networkPhase transitionDecision theoryoutputVisualization (computer graphics)ForestAnalytic continuationObject (grammar)Source code
Gamma functionSoftware testingNetwork topologyFile formatLetterpress printingPredictionHydraulic jumpVisual systemMathematical analysisExtension (kinesiology)Machine codeCellular automatonTendonGEDCOMLaptopSigma-algebraMedianElectronic visual displayMusical ensembleControl flowSource codeMenu (computing)World Wide Web ConsortiumMathematicsModel theoryModel theoryWave packetExtension (kinesiology)Cellular automatonLaptopMathematicsFunction (mathematics)Network topologyView (database)Level (video gaming)Sign (mathematics)Software repositoryMessage passingMachine codeMetadataMultiplication signDifferenz <Mathematik>Menu (computing)Different (Kate Ryan album)Game controllerRight angleBuffer overflowComputer fontContent (media)Computer fileEmailDecision theoryTable (information)Moment (mathematics)ImplementationComputer configurationShared memoryState of matterSet (mathematics)Line (geometry)INTEGRALCountingCuboidRevision controlMultilaterationEndliche ModelltheorieMereologyArtificial neural networkText editorDigitale VideotechnikScripting languageConnectivity (graph theory)Arithmetic progressionoutputSource code
Menu (computing)Network topologyTask (computing)Attribute grammarDemo (music)Matrix (mathematics)GEDCOMElectronic visual displayGamma functionArmExtension (kinesiology)Machine codeScripting languageOperations researchSystem programmingVisual systemTablet computerModemMach's principleHypercubeGraphics processing unitInterface (computing)Ewe languageRing (mathematics)FingerprintHydraulic jumpMaxima and minimaComputer configurationWindowRemote procedure callModel theoryIntegrated development environmentRight angleInstance (computer science)Extension (kinesiology)Connected spacePlanningVirtual machineLaptopKey (cryptography)BitCore dumpMultiplication signPower (physics)UsabilityEndliche ModelltheorieMachine codeService (economics)Demo (music)Data managementProcess (computing)Computer fileSemiconductor memoryWave packetElectronic mailing listWeb pagePay televisionNeuroinformatikRepository (publishing)Software developerOpen setMathematicsPresentation of a groupClassical physicsGame controllerVideoconferencingLevel (video gaming)Graphics processing unitWorkstation <Musikinstrument>Computer clusterSet (mathematics)outputLatent heatEquivalence relationIterationData storage deviceSystem administratorMachine learningSource codeJSONProgram flowchart
Web 2.0Hydraulic jumpMachine code1 (number)Gamma functionGEDCOMNetwork topologyAttribute grammarElectronic visual displayPredictionCASE <Informatik>Computer fileSoftware repositoryRight anglePresentation of a groupLaptopWindowConfiguration spaceConnected spaceRadical (chemistry)Instance (computer science)MereologyPoint cloudScripting languageRepository (publishing)Uniform resource locatorCloningMathematicsComputer configurationMachine codeKernel (computing)Cellular automatonRemote procedure callDifferent (Kate Ryan album)Source code
MorphingAttribute grammarGraph (mathematics)Network topologyForm (programming)PredictionComputer fileString (computer science)Matrix (mathematics)Normed vector spaceGamma functionIdeal (ethics)Electronic visual displayGEDCOMSheaf (mathematics)Model theoryComputer wormMaxima and minimaStructural loadWordCore dumpComa BerenicesGeometryFunction (mathematics)Gauge theoryIntegrated development environmentSource codeIcosahedronEmailEuler anglesCAN busStreaming mediaInteractive televisionMetric systemFunction (mathematics)MereologyMedical imagingState of matterRadical (chemistry)Software development kitView (database)Right angleWave packetLink (knot theory)MathematicsBitMultiplication signContext awarenessSoftware developerPower (physics)UsabilityWindowCellular automatonComplex (psychology)Integrated development environmentOperator (mathematics)NumberWeightScripting languageIterationLaptopService (economics)NeuroinformatikInstance (computer science)Virtual machineModel theoryReal-time operating systemComputer fileCorrespondence (mathematics)LoginLibrary (computing)Latent heatOrder (biology)Extension (kinesiology)Software repositoryDifferent (Kate Ryan album)CASE <Informatik>Web 2.0Source code
Gamma functionLemma (mathematics)Maxima and minimaKeilförmige AnordnungFunction (mathematics)InformationLoginSet (mathematics)Metric systemMereologyComputer animationSource code
Gamma functionOpen setCategory of beingIntegrated development environmentForm (programming)
Organic computingFunction (mathematics)Gamma functionScripting languageBlogGEDCOMMassMenu (computing)Virtual machineRadical (chemistry)Computer clusterLaptopExtension (kinesiology)Operator (mathematics)Right angleModel theoryInstance (computer science)Computer fileInteractive televisionRemote procedure callLoginConnected spaceNeuroinformatikPoint (geometry)Group actionMereologyPresentation of a groupSoftware developerVideoconferencingOrder (biology)Context awarenessMachine codeService (economics)Streaming mediaSource codeProgram flowchart
Control flowVideoconferencingClosed setMeeting/Interview
Transcript: English(auto-generated)
This is the last talk. It's a record talk. So, the talk is Python Data Science with BS Code and Azure. And it's a talk by Claudia Regio. I'm not sure why I want to pronounce
that. Claudia works at Microsoft. She's a program manager at Microsoft. And she's focusing on Python and data science. Yeah, so, we're going to play the video. Claudia is going to be available for questions in the break room after. So, enjoy.
Hi, everyone. Happy EuroPython 2021. I'm Claudia, and I'm here with Sid. And we are both program managers on the Python Data Science and AI team and VS Code at Microsoft. I'm currently focusing on the Python notebooks experience within VS Code. Hey, everyone. I'm Sid, and I work on the Azure Machine Learning extension for VS Code. So, today, Claudia and I are really excited to show you demos of both
Jupyter Notebooks as well as the Azure Machine Learning extension. So, let's get started. So, let's go through a few slides really fast. Our Twitter handles are right here. So, if you want to contact us, these would be the ones to grab. So, first order of business, what do you need? You're going to need VS Code,
obviously. You will also need the Python extension, which comes with Jupyter and Pylance. And you'll also want Gather and Live Share to get the full experience about what we're talking about today. One thing I want to call out really fast on this slide is for anybody who has seen me talk about this in another video,
demo potentially, this slide used to say VS Code Insiders. That is not the case anymore. We have officially rolled out native notebooks as the default experience for everyone VS Code stable. There's no more opting in, no more needing to jump through hoops and download Insiders to try it out. This is the official new default experience and we're really,
really excited to roll it out to you guys and we hope that you try it and let us know your thoughts and feedback. As always, reaching out on GitHub is a great place to find us, as well as Twitter as previously mentioned. So, go ahead and try out and let us know your thoughts or feedback on how we can improve. We're excited to hear from you all.
All right, why the changes? So, previously we had a WebV implementation and that comes with a couple of limitations, unfortunately. So, we went ahead and moved to natively supporting the IPYMB file type. Some of the benefits that come with that, for example, are the integration of your favorite VS Code extensions from the marketplace.
A good example of a really popular one is bracket colorizer. That one could not light up within the old implementation of notebooks. However, it does work within a notebook. So, whatever ecosystem of extensions you have will light up within the notebook. You can also expect improved notebook load times and, of course, a new and refreshed
modern design. So, now we can go ahead and get started and just go over some features. Coming to our Titanic notebook for the purpose of the rest of the demo, running locally is going to be just fine. But if you ever need to leverage more powerful remote resources, you can come down to the global status bar and click Jupiter server.
And this will allow you to connect to an existing Jupiter server as long as you have the URL or if you have the Azure Machine Learning extension installed in a compute instance, you can also connect to one. All right, so continuing on in our notebook, this first cell
is just a bunch of imports. We can go ahead and skip that one. Pretty typical. But the next thing we might do is, you know, want to take a look at our data frame that we've just imported, get familiar with the columns, the number in there, the ranges. And a lot of times what you can do in a notebook as you start to explore is you create
variables and, you know, you're renaming them a little bit or maybe you're just overriding their value and rerunning and rerunning. And it gets really easy to lose track of the state of your variables. So we've gone ahead and created the variable explorer, which will show you the active variables within this notebook. And that's really helpful. You see the name, the type,
you get to see the size and a preview of its value. So you don't have to plug your notebook with a bunch of print statements to understand where you're at anymore. You just open this up and check where you're at. And as far as checking, you know, seeing tabular data, you can go ahead and access the data viewer through the variable explorer too.
So any tabular data, go ahead and click on the icon to the left. And that's going to open the data viewer. And this is just going to give you an Excel-like view of your data. This makes it a lot easier to process. And we also have these filters at the top. So typically when you're getting started with your data, you want to make sure it's clean. And so you could write code to identify the issues and then write more code to fix the
issues. Or you can use these filter tabs at the top to help you identify those issues a bit quicker. So you're not writing code that turns out you don't actually need. Saves you a little bit of time there from that stage. Coming back to our notebook, here we have pandas profiling. We have support for iPad widgets. Obviously,
iPad widgets are very powerful. They make even outputs more interactive. So if you're somebody who shares out reports, potentially with people who are not as familiar with the Python language, changing outputs or tweaking them slightly based on other parameters can be difficult.
That's a lot more easily achieved with iPad widgets. So, of course, we have support for codes. Coming down through the rest of this notebook, we have some feature engineering, some harmonic coding, categorical variables, normalizing continuous ones.
We're also going to make sure that we're not going to have any null values, et cetera, before we keep going. And then we can start with training our models. However, these two cells have actually ended up in the wrong spot. I actually want to make sure I move them and put them under the test train split section, which is where they're more
appropriately categorized. If you want to move multiple cells at once, you can select multiple cells because we support that now. So if you want to select multiple cells, go ahead and hover to the left of the other cells you want to select. If you hold down shift and click, that's going to be the traditional selection of continuous objects. Otherwise, if you want
like every other or separate cells that aren't necessarily continuous, go ahead and just click control and move it as well. And that is going to select multiple cells for you. Then you can hold down, click on any cell in your selection, and that's going to move both of them. So I'm going to go ahead and just put that under test train split.
All right, once we've done our testing and we've done our training and testing split, we can actually go ahead and start creating and initializing some models and training them and checking their accuracy score, et cetera, to predict the survivors based on inputs for this type of notebook. So here we have a phase model, decision tree, got even nice visualization
of a very complicated tree as well as random forest and eventually even a neural network. You can go ahead and see also the model accuracy, training validation, et cetera. So here I have
the last cell in this notebook is actually basically comparing the accuracy between these models that are created here. And let's say we want to move forward with the most accurate one could be the decision tree, could be the, you know, neural network. These appear to be pretty even. Let's go ahead and just move forward with the decision tree for now. What I could do is I
could scroll back up all the way to the decision tree or we can navigate through this notebook with the table of contents, which is much, much faster to access the table of contents. First, make sure you are in the file explorer tab and then come down to the bottom and select outline. And this is going to show you the outline based on the markdown headers that you've created
by default. So I can go ahead and just go back to my decision tree. This is the model I want to move forward with. And by default, the outline is actually going to show you just markdown. However, if you're somebody who would also like to see your code, you can change it in your settings as well to show your code. So moving on, let's say I want to, you know, as I mentioned,
we want to move forward with the model that we have the most likely success with. So let's go ahead and actually gather on this cell. Now, what does it mean to gather on this cell? Gather is the exploratory extension that I mentioned earlier. And gather will basically
analyze the dependency of the stuff that you have previously run in your notebook. And it will pick up all the lines of code that are necessary to generate that cell or that cell's output. So here we have a decision tree classifier we talked about. So let's gather on this cell.
And what that's going to do is it's going to generate a notebook that will basically just grab the lines of code that are necessary to make that one cell. So here you can see our imports are a lot shorter and even our cells are a lot shorter because it's only grabbing really essential lines of code that we're required to make that cell. Now, you can also customize this.
If you prefer that this is not a notebook, you can go to your settings and you can actually have this export to a Python script and we can go back to this notebook. One more feature I wanted to show you guys, another one that is also coming with us from the old implementation is the ability
to export to a Python script. So when you're ready with whatever script you may not necessarily want to clean with Gather, maybe your notebook is already clean and you want to get it into a state that's ready for production. What you can actually do is you can go ahead and click export and we have the options for Python script, HTML, PDF file for easy sharing purposes as well.
So I can go ahead and just export this to a Python file and here we have our new Python file. So I can actually go ahead and save this. Let's do save as, we're going to call this titanic.py
reasonable name to me. All right and then we have our titanic.py. So now we have our Python file and we can go ahead and come back to our notebook.
Now one thing that I'm really really excited to show you guys one of the biggest benefits of making this change to the native implementation is that we finally have support for notebooks with the source control and get integration with this book. That being said, I'm actually going to go ahead I'm going to save this notebook real quick. I can close it for the
time being and we can actually go ahead and open up the diff view of this notebook. For some of you who may or may not know this, basically under the hood a notebook is a JSON file
and the segments of those JSON files are comprised of three components. You have the input which is where you write your code, you have the output which is what you see, and then there's also metadata which is data about the cell. And so when you're using line-based diffing tools for notebooks that's really really hard to parse the changes that
you made in your notebook and be able to genuinely understand the progress that your notebook has gone through. So vs code has created a rich editor diffing you just for notebooks. That will allow you to see the changes in your notebook very very clearly.
So what I went ahead and did is my first version of this notebook actually did not include the neural network. I went ahead and added that a little bit later. So here you can see the imports that I added after. You can go ahead and see I actually deleted a line when I was doing some testing, but it's really nice because these box segments allow you to really
really understand your differences in your notebook fairly well. And probably the best part about this is sometimes you don't always want to see all of the differences that a notebook may you know bring up to you. For example if you run a cell a couple of times the execution
count will change and that would be considered a metadata change and git would flag that. Or you may run an output a couple of times and the output ID in the metadata changes however your output may not necessarily change. So you can actually customize what kind of git diffs you want to be surfaced in this diff view. To do that you will actually just want to come
to the top right and hit the overflow menu and you can actually just select what differences you'll want to see. So basically you're fully in control of what you're seeing when you see. Now that I have gone ahead and now that I've gone ahead and shown the custom diff view
I have all my changes. I'm actually going to go ahead and push all the changes that I have locally. And again we can do this all through the vs code ui now because there is you know support for for notebooks natively here. So I'm going to go ahead and click on the plus sign
on the changes row. That's going to go ahead and stage all my changes and I'm going to go ahead and provide a little commit message here. I'm going to call this updated notebook and pi script. Then we're going to go ahead and click this little check that will commit for us
and here in the source control pane I can click more actions and end up actually going to push to and I can select origin this is just the origin of that repo which is the repo that Sid is going to be using in a moment here. So now that I've gone ahead and you know pushed all
the local changes that I've made to the local repository now Sid will show you how you can actually further accelerate your model development using Azure through the Azure Machine Learning extension. So thanks so much Claudia. So now I'm excited to show you how you can use the
Azure Machine Learning service to accelerate your model development and training in vs code. But before I start with showing you all the cool stuff I want to briefly talk about the Azure Machine Learning service for all of you that may be unfamiliar. So Azure Machine Learning or Azure ML as I may refer to it throughout this presentation is a cloud-based environment that you can use to train deploy automate manage and track your
machine learning models. This cloud-based environment can be used for all kinds of machine learning so anything from classical ml to deep learning as well as from supervised to unsupervised. With the Azure Machine Learning service what I'm excited to also go into today is the Azure Machine Learning extension for vs code. This extension is a companion tool to the
service that allows you to tap into more powerful resources for your model training from directly within vs code. You can use it to manage manage list use create and update all of your machine learning resources. So now here I have vs code open as you can see I can get to this extension by just navigating to the extension marketplace
searching for machine learning and then and then clicking on the topmost item that you see here. Now this is the extension extension page I have it installed already that's why there's the you'd see that this Azure tab is created on the left hand side. Now if you have existing
Azure extensions in vs code you may already have this tab available to you but what's important to know is that now you can use the Azure tab to open up the Azure Machine Learning pane as you see here and get a list of all of your Azure subscriptions. So now if you're not signed into your Azure account you can you'll get prompted to do so because the extension will
not work without an Azure account but once signed in you have all of your subscriptions listed here and so the next thing would be that what happens when you expand on a subscription. So when you expand on a subscription as I'm going to do now you'll see a list of machine
learning workspaces available for you to use. You can think of a machine learning workspace as a top level resources that organizes all of the underlying resources you'll use for your model building training and deployment. The concept of underlying resources I guess becomes more apparent when you expand the workspace node presenting a list of first-class resources and
concepts in Azure ML. For the sake of time I won't be going through each of the available resources but I will do my best to explain some of them along the way. So now let's talk about what I hope to achieve with this service. So as I talk to you right now I'm using my work laptop which frankly is great for things like recording demo videos and writing documents
but not so much for training complex machine learning models. I'm interested in using the Azure machine learning service to create a much more powerful workstation that I can use. So what do I mean when I say more powerful? Well I'm referring to a machine that has GPUs significantly more RAM and memory sorry significantly more RAM and storage and is highly
compliant and secure something the IT administrators on my team are very strict. So with the Azure machine learning extension I can list all of the available computes or VMs that are available to me. You'll see here that I'm interested in using a compute instance. Now this is the equivalent of a personal workstation or machine that I can use for my iterative model development.
I have compute clusters listed here as well but I'll expand on those a little bit later in the presentation. Right clicking on a compute instance node presents me with an option to create a new resource. The Azure machine learning extension presents a simple set of prompts for me to follow eventually culminating in the creation of my resource. Now let's go through these prompts together. So the first thing that I'm asked for is a compute
name. Here I'm just going to input euro python machine and I can hit enter. The next thing I'm prompted for is the VM size. So this is where I can search for the machine specifications based on what I want to do. Right as mentioned earlier I'm looking for something that has a GPU and more RAM so I can simply just search for GPU here and I'm presented with a bunch of options.
The NC6 VM skew seems like a great one for me to use so I can go ahead and select that one. Right that's it's six cores it's 56 gigs of RAM more than what I have on my laptop right now and it has a GPU that I can use. The next thing that I'm prompted for is whether
or not I want to make this machine SSH enabled. So I know that making it SSH enabled may mean that it might be easy for me to connect to my machine but I also know that my IT team won't be happy about me managing my machine access through key credentials it's just not audible auditable. So let me go ahead and select no and now once I've selected this option you'll see
that the machine learning extension will immediately proceed to creating the resource for me. So it's going to take a little bit of time for this machine to get created but luckily I have existing compute instances that we can use. So now you might be wondering okay I'm creating this machine through the Azure Machine Learning extension awesome I'm able to tap into a much
more powerful resource with ease but what if I want to use VS code with this resource how can I do that? Well the Azure Machine Learning extension makes it really easy for you to compute uh sorry for you to connect to this compute instance and get started working with it from within VS code. So you can do that by right clicking on an instance so here I have this
zero Python 2021 demo instance and then choosing the connect to compute instance option. What this is going to do is that it's going to open up a separate window which is a remote VS code window and now a couple of things are happening we're just installing uh we're we're we're establishing a connection between your local machine and the compute instance all from all
through WebSockets so we're everything is going through Azure Machine Learning control plane everything is happening over WebSockets it's a highly secure connection and and like and we talked about auditability right so the key thing here is that all of the access management is done done through AAD so because I'm signed into my Azure account I
can use the credentials that are associated with my account and successfully connect to the compute instance. Now once I'm within this compute instance I want to navigate to I want to
navigate and then navigate accordingly right so so now what I'm going to show you is a couple of things that I can do once I'm connected to this compute instance from within VS code so now what is the what does the remote connection mean well it means that I have VS code hooked up to the compute instance that I just created and I can use anything that VS code allows right so I can navigate to the extension marketplace I can look at all of the
extensions that I have these are automatically installed for me on the compute instance I can use any one of these extensions I can I can go I can debug processes I can debug Python files both of these are both of these are debug configurations that I've already created and then I can also use things like the remote terminal so in this case what I'm going
to start off by doing is I'm actually going to clone the I'm actually going to clone the GitHub repo that that Claudia was previously working in right so Claudia during her presentation her part of the presentation sorry showed that she was making changes to that to the Titanic notebook
and then and then she committed those changes and then she pushed those changes to the to the remote repository now what I can do is I can get the remote repository URL and I can simply do a git clone here and then clone that that exact same data science repo so that I can immediately get started in this get started working with the same notebooks that she was
previously in so it's just going to take a second to clone and now once it's cloned what I can what what now I can do is I can navigate to that folder as I was showing you earlier so I go to Azure cloud files code that's the data science repo hit okay my window gets automatically reloaded and now with this window reload we're again establishing the connection to
the compute instance it's really fast on subsequent connections because we've done all of this setup from before and we're just making we're just reusing that and then once once once we've done that now we're working directly within this scoped folder so you'll see here that I can open up that Titanic notebook file that Claudia had Claudia was working on earlier
when I open up the notebook file it's just gonna it's just gonna take a it's just gonna take a second but when I when I open up the notebook file you'll see that the the note the cells get rendered correctly and I can now use this notebook the uh and and and run uh run all of the cells that were there from before so my notebook is loaded uh that's that that looks
awesome um I one thing is I can also uh I can also open up the the the python script that uh that Claudia uh had exported from before as well I'm going to be using the python script a little bit later but for but for now I'm going to work in this notebook let me select a
kernel now so I have a couple of different options here you'll see that uh I have this azure ml pi 37 that's the one that I want to use uh so now I'm connected to this kernel and then I can just run all of the cells so now um I'm now each of these cells are running and uh and eventually the entire notebook is going to run the exact same way in which
it did on Claudia's machine but now um but now on the compute instance that I was working with so this is this is really awesome because Claudia was previously working on that repo and she she she made a bunch of changes she wrote this notebook she committed those changes and now I was motivated to use it not on my local machine but on the more powerful azure machine learning resource that I had available to me I created it with ease I then
connected to it from within vs code I'm now working in this remote window and I can just clone the repo and and run all the cells as before so the cell running will take a will take a little bit of time as you're aware but I can kind of continue on with the rest of my machine learning development and and and deployment all from all from within vs code using the power of
azure so now that I've kind of showed you how you can do iterative development from iterative development using azure machine learning compute instances from within vs code I want to now get into another part of azure of the azure machine learning service that may be of interest
so here I was working in a notebook and Claudia and Claudia was working the same notebook and she actually exported it to a python script earlier right so now what I can do is I can use that python script that she had exported which I just re I just renamed to train.py and I can
run that python script on a compute cluster in azure ml so what am I doing here this is you can think of this as kind of like a fire and forget operation for your model development right so you work on a notebook you use that notebook against a more powerful a more powerful machine and you validate that the notebook is behaving as you'd expect right so your model
is being trained correctly now you want to do more complex model training or more intensive model training right so you can think of that as like increasing the number of epochs to train with your neural net with doing a very complex grid search that's going to take a really long time but you want to offload that operation to remote resources as opposed to using
your machine right because you want to continue you may want to context switch and work on something else and you don't want the the model training or development to kind of be a blocking operation even if it's a really powerful machine here right you just you kind of want to fully offload it so what you can do is you can submit what's called an experiment to azure ml
so uh so azure ml has this concept of experiments which are comprised of a couple of different things your the train the scripts that you're trying to train with the compute that you want to run on so that's the compute cluster that i was talking about and notice that it's a cluster meaning more than one node here the compute instance it's just a single dedicated
node that i'm using but a cluster has multiple nodes that i can make use of which means that i can even do things like distributed training and then lastly is uh the the last two things would be data and environment in this case the data is part of what we're going to be putting on the cluster ourselves so i don't have to talk about i won't get into azure ml related data
but an environment i'll just quickly speak to you can think of as an environment as a definition of your python packages and libraries that gets materialized as a container a docker container on your compute cluster so i've already created an environment let me just quickly show you submission script um i uh here like i'm referencing a compute target so that's a compute cluster that
i've created from before um i've here i'm creating an environment using an environment.yaml file so if i open up this yaml file this corresponds this is a conda specification file that that has all of the dependencies that i need in order for me to run this training script um and then i'm and then i'm going to change this experiment to uh euro python right and so
what euro python 2021 and so now what's going to happen is that it's the experiment is going to be when i run this script it's going to run an experiment against this experiment name it's going to package my training script my my environment and then and materialize all of that
on the cluster and it's going to run and then what i want to show you is how you can use both the azure machine learning extension as well as the azure machine learning studio to track your run all right so let's run the script so i'm going to create a new terminal what's what's going to
happen when i create this terminal is that it's going to connect to the azure ml pi 37 environment that i want to use and then i can i can simply just navigate to the azure ml folder and run python submit experiment.py so now this is going to submit my experiment for me right so i've used the azure ml python sdk to submit to submit the run and and i have included a
couple of streaming here a couple of things to stream here you'll see that there's this web view link i'm going to get to this web view link in a bit but before i even get here what i want to show you is if i go to the azure machine learning tab and i go to experiments the very topics the very top experiment you see here is the one that i've created as part
of this run when i click on this you'll see that there is an experiment run here with the status uh with with a status dropping down on this gives also gives me logs and outputs that i can look at so here if i click on logs there's nothing yet but hopefully eventually the logs will come up i think that what's happening is that the experiment is just
being queued against the compute cluster and in just a second there will be an image built log for us to take a look at all right so the the state of the run actually changed to running and if i look at the logs you'll see here that there are these azure ml logs which i can double click on and then stream directly from within my vs code console this is so this
is really cool because these are logs that are being updated real time and i'm streaming them from within vs code through the aml extension so now i mentioned a different way in which you could kind of interact with these runs to look at both things like metrics and logs and that would be from the azure ml studio so you'll see here that if i look at outputs and
logs oh there's a failure that's okay we can look at it we can we can look at that a little bit later but if i look at the outputs and logs there is information here it's the it's the same kind of logs that i was streaming as part of my it was the same logs that i was streaming the in vs code just a second earlier right so i have access to the same the same set of
outputs and logs and then if i was logging metrics as part of my run i could view them from here as well and then this kind of overview is a full is is provides me with full details of of my run from things like the environment that i was using to the compute target as well as well as as well as the properties of the run in both json
and yaml form so um yeah let's let's get back to vs code so um i just want to quickly recap everything that we've talked about thus far so i we kind of started with claudia doing doing her thing in notebooks and getting to the point where she wanted to transition
where she's transitioning transitioning it to me and and then the first thing that i was motivated to doing this was i said i don't want to run this notebook on my local machine i need i want more powerful resources how can i quickly tap into those using the azure machine learning extension i can quickly create a compute instance i can make it i don't have to make it ssh
enabled in order for me to work against it and and i can establish a connection from my machine to this compute instance um now uh with uh with after i've done that i can open up the notebook file i can do things like debug interact with the remote terminal use extensions run git commands everything that i would otherwise do in vs code but now on this remote machine
when i'm ready for kind of that fire and forget operation right because i want to context switch and i want to do something else on my local machine while still training my model elsewhere i can i can easily use the azure machine learning service to do so by submitting an experiment to a compute cluster and specifying an environment for for my code to run in
once that experiment is running i can stream logs not only from within vs code through the aml extension but also from the studio ui as as i just showed you there and so that was it for my part of the presentation of how you can use azure to accelerate your machine learning
development now if you'd like to join both claudia and i for q a please hop over to matrix and join the microsoft's sponsored room or channel we'd be happy to take any questions that you have thank you so much okay so that was the last talk today as you saw in the video you can go to the sponsors microsoft channel and ask questions there so okay now we are going to have a few
minutes like one minute break or maybe because this video was longer than expected um after that mark is going to be doing the closing so thank you everyone uh see you later