Crunching through big data with MBrace, Azure and F#
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 163 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/50192 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Operations researchThomas KuhnMachine learningTask (computing)CodeComputer programScale (map)Scripting languageParallel computingIntegrated development environmentIterationData modelFinite-state machineNumerical digitPoint (geometry)Position operatorMathematicsNeuroinformatikBitEndliche ModelltheorieOperations researchMedical imagingAxiom of choiceData structureAdditionNumberWave packetPulse (signal processing)Film editingWeightTwitterFinite-state machinePresentation of a groupInternetworkingPlanningIntegrated development environmentPoint cloud2 (number)Uniformer RaumBuildingRight angleOrder (biology)Pattern recognitionFunctional programmingComputer programmingSet (mathematics)Multiplication signOpen setVirtual machineType theoryTask (computing)Level (video gaming)Scripting languageClassical physicsElectronic mailing listMachine visionWordTouch typingCuboidPhysical systemForcing (mathematics)Machine learningVisualization (computer graphics)Color confinementFormal languageSoftware engineeringProgramming paradigmOpen sourceElectronic data processingMathematical statisticsCodeDigitizingParallel portProcess (computing)Software frameworkSampling (statistics)AbstractionWritingVideo gameDemo (music).NET FrameworkKey (cryptography)IterationPattern languageSurfaceCore dumpLibrary (computing)ImplementationExploit (computer security)Instance (computer science)JSON
08:00
Dynamic random-access memoryMenu (computing)Product (business)Point cloudComa BerenicesGoodness of fitSocial classCore dumpPay televisionVirtual machineSummierbarkeitMultiplication signSpacetimeSoftware development kitProcess (computing)Row (database)Data storage deviceNeuroinformatikMachine visionComputer animation
09:46
Product (business)FluxPhysical systemPixelLengthComputer-generated imageryRevision controlInteractive televisionOnline helpInterior (topology)Differenz <Mathematik>Sample (statistics)Line (geometry)ParsingString (computer science)Directory serviceData modelAverageBefehlsprozessor2 (number)SpacetimeConfiguration spaceClient (computing)Core dumpConvex hullRun time (program lifecycle phase)Execution unitComputer networkRead-only memoryTotal S.A.Letterpress printingInformationProcess (computing)ForceComplete metric spaceData typeComputer virusCodeVisualization (computer graphics)2 (number)Term (mathematics)BitProcess (computing)Instance (computer science)Validity (statistics)Point (geometry)Medical imagingIntegrated development environmentMultiplication signSubsetSet (mathematics)NeuroinformatikPredictabilityDistanceWave packetRevision controlElectronic mailing listStandard deviationNumberDifferent (Kate Ryan album)SummierbarkeitPoint cloudSampling (statistics)Functional (mathematics)Line (geometry)CountingCASE <Informatik>ParsingAbsolute valueQuicksortSoftware testingEndliche ModelltheorieAverageMachine learningType theorySemiconductor memoryPixelDataflowFreewareBefehlsprozessorDecision tree learningVirtual machineInformationPerformance appraisalRun time (program lifecycle phase)Computer fileCausalityStress (mechanics)Ocean currentMachine visionEmailTape driveDisk read-and-write headProduct (business)System callOpen sourcePosition operatorProjective planeComputer animation
17:33
Point cloudRun time (program lifecycle phase)String (computer science)Computer virusConvex hullRow (database)Process (computing)Hill differential equationHost Identity ProtocolLine (geometry)ParsingPixelWave packetComputer fileDirectory serviceLocal ringMiniDiscError messageConstructor (object-oriented programming)Client (computing)2 (number)Control flowDataflowSoftware testingData modelPoint cloudCodeElectronic signatureWave packetOperator (mathematics)Scripting languageDomain nameNeuroinformatikAbstractionDot productProcess (computing)Different (Kate Ryan album)ExpressionValidity (statistics)Denial-of-service attackCASE <Informatik>2 (number)Game controllerThread (computing)Client (computing)CompilerString (computer science)Virtual machineDebuggerEqualiser (mathematics)Poisson-KlammerBitSound effectCloud computingComputer filePredictabilityPhysical systemFunctional (mathematics)ResultantComplete metric spaceSet (mathematics)Key (cryptography)Arithmetic meanClosed setGoodness of fitMessage passingLabour Party (Malta)Compilation albumBit rateECosSoftware testingDirected graphQuicksortMachine visionCovering spaceComputer animation
25:14
Data modelPoint cloudSoftware testingComputer fileWave packetAveragePixelCodierung <Programmierung>Line (geometry)ParsingProcess (computing)Execution unitTotal S.A.Parallel portError messageModule (mathematics)SpacetimeGlass floatRun time (program lifecycle phase)BlogLetterpress printingInformationGeneric programmingType theoryForcing (mathematics)Endliche ModelltheorieSystem callDot productMultiplication signVirtual machineWave packetTask (computing)Local ringPhysical lawGame theoryFocus (optics)Power (physics)Spring (hydrology)Point cloudProcess (computing)ResultantMedical imagingBitCASE <Informatik>WeightComputer fileClosed setMathematicsPoint (geometry)Poisson-Klammer9 (number)WritingDifferent (Kate Ryan album)State of matter2 (number)Complete metric spaceNeuroinformatikEqualiser (mathematics)Parallel portSoftware testingDampingCodeInterior (topology)Slide ruleEmailLevel (video gaming)Exception handlingComputer animation
32:54
Point cloudParallel computingDrum memoryExecution unitTask (computing)Process (computing)LaptopCodeComputer wormWordCountingFluxInteractive televisionRevision controlOnline helpString (computer science)Internet service providerCore dump.NET FrameworkRight angleNeuroinformatikLevel (video gaming)BitGoodness of fitClassical physicsComputer fileMoore's lawRevision controlVirtual machineCountingParallel portFormal languageMultiplication signVisualization (computer graphics)Set (mathematics)CodeExpressionLocal ringPoint cloudIntegrated development environmentTask (computing)CASE <Informatik>Power (physics)Family40 (number)Machine visionType theoryInternetworkingSoftware development kitSystem call
37:07
Menu (computing)String (computer science)Internet service providerSheaf (mathematics)Run time (program lifecycle phase)Query languageType theoryInternet service providerLibrary (computing)Uniform resource locatorSheaf (mathematics)Software testingDependent and independent variablesCASE <Informatik>Computer animation
37:56
IRIS-TString (computer science)Internet service providerComputing platformSample (statistics)Element (mathematics)Group actionSequenceFunction (mathematics)outputSource codeExecution unitDependent and independent variablesComputer-generated imageryShooting methodCommon Intermediate LanguageAuftragsspracheUniform resource locatorInclusion mapGraphical user interfaceQuery languageWeb pageHill differential equationKey (cryptography)Structural loadRootDisk read-and-write head2 (number)Configuration spaceCore dumpClient (computing)DataflowRun time (program lifecycle phase)Point cloudComputer fileProcess (computing)Physical systemGlass floatThread (computing)Row (database)Line (geometry)Constructor (object-oriented programming)Codierung <Programmierung>Internet service providerDependent and independent variablesType theoryResultantInstance (computer science)Object (grammar)Goodness of fitMessage passingComputer fileVirtual machineDataflowSource codeWeb pagePoint (geometry)System callMultiplication signWeb 2.0HookingLine (geometry)Point cloudSequenceCASE <Informatik>Moment (mathematics)BitSession Initiation ProtocolWater vapor2 (number)DemosceneProcess (computing)ExpressionOperator (mathematics)MereologyAbstractionLink (knot theory)Sampling (statistics)Letterpress printingUniform resource locatorQuery languageStructural loadForcing (mathematics)AverageTape driveConstructor (object-oriented programming)Game theoryWebsiteDistanceLocal ringComputer animation
45:34
Point cloudComputer filePredicate (grammar)DataflowProcess (computing)String (computer science)Error messageCommon Intermediate LanguageLengthRow (database)Digital filterCodePhysical systemSerializabilityCountingCASE <Informatik>WordLibrary (computing)Software developerLine (geometry)ParsingBlock (periodic table)Data conversionBefehlsprozessorComputer configurationUniqueness quantificationLocal GroupInteractive televisionRevision controlOnline helpLoop (music)Computer fontCodeProcess (computing)SequencePoint cloudVideo gameInstance (computer science)ResultantWordDataflowLine (geometry)Software developerComputer fileRegulärer Ausdruck <Textverarbeitung>CountingConstructor (object-oriented programming)LengthGroup actionDot productCASE <Informatik>Degree (graph theory)Core dumpLink (knot theory)ExpressionType theoryVirtual machineMultiplication signBitLocal ringGoodness of fitRewritingSocial classString (computer science)Parallel portLevel (video gaming)MereologyRule of inferenceQuicksortInternetworkingConnected spacePoint (geometry)AuthorizationSelf-organization2 (number)PlastikkarteBlock (periodic table)CodecSystem callNeuroinformatikComputer animation
53:00
Convex hullRevision controlInteractive televisionOnline helpLoop (music)Block (periodic table)Computer fontInformationCodeVisualization (computer graphics)Goodness of fitResultantBitMultiplication signCopenhagen interpretationPhysical systemMachine vision
54:15
Data modelUsabilityPower (physics)Software developerPoint cloudDataflowPerformance appraisalCodeSequencePower (physics)Endliche ModelltheorieEmailLine (geometry)TwitterComputer filePresentation of a groupGreen's functionMultiplication signLibrary (computing)Integrated development environmentLengthLocal GroupInstance (computer science)Message passingCartesian coordinate systemOpen sourceSoftwareCore dumpStapeldateiMereologyLoginStandard deviationLevel (video gaming)Dependent and independent variablesBitLabour Party (Malta)Total S.A.Right angleCASE <Informatik>Row (database)
Transcript: English(auto-generated)
00:03
So maybe let's get started So this talk will be is titled like crunching through big data. So it's probably going to be more big data ish I'm using three things like azure embrace and f-sharp and also a bit of c-sharp A couple of words about me. My name is Matias Brown de vendor. You can find me on Twitter
00:22
Been doing c-sharp for maybe 10 years five years ago. Somebody told me you should learn a new language every year I opened Visual Studio. There was a book saying f-sharp. I thought what's this? Let's try it I tried it. I loved it. And at that point I do mostly f-sharp and a bit of c-sharp. I'm also mostly I'm actually not a software engineer from background
00:41
I'm an economist or an operations research guy and so my interest at that point is reusing mostly F sharp for machine learning or that type of topic like I'm really interested in quantitative models And everything which is at the intersection of like maybe mathematics statistics and computing and I do have a bit of an accent So I'm saying this I know nobody notices. It's very subtle
01:03
but the The reason I'm mentioning it is like Sometimes I'm hard to understand and I tend to get very excitable when I talk about things I care about f-sharp and machine learning happen to be things I care about at that point I speed up. So if at any point you can't understand anything I say like hold me back like this is your job Otherwise, I'm just going to go through
01:22
Good, so that was me. So what do I want to talk about today? So I want to introduce really mostly a library or framework called embrace And whatever what I want to do is I want to introduce it through examples and My goal there is going to illustrate how the library might be useful or where it might be useful
01:41
I'm going to do it mostly Through things which I care about but like I tried also to make the examples General enough that hopefully you can relate to them Even if machine learning is not your thing like essentially I'm going to try to show you what I do with it And hopefully it will give you ideas so that you can actually do fun stuff with it Which might or might not be the same as me. It won't be a detailed presentation on embrace
02:03
This what you will do on your own more like examples So the plan will be first like explain what embrace is what it is about and then I'm going to have two examples One of them is more on the machine learning side. I Called it like big compute So it's not that much about using a lot of data But more about like what do you do when you have a lot of computation to do which is quite common in machine learning and the
02:23
Other side will be more on the big ish or big data ish side It's more of a data processing example, which should be closer to what most people do warning tool like you saw most Everybody will tell you when you start to do presentations that you should minimize risk
02:42
And so of course, you should not depend on the internet. You should not write life code You should not depend on the cloud. So I'm depending on all these things to work flawlessly So in a way like so terrible things could happen So maybe it's a win-win if everything works. I have a great demo and if things doesn't work Hopefully it will be at least entertaining but I'm warning you like things might go very wrong
03:04
So what is embrace so embrace is an open source library By a company called nessos. They are based in Greece. Eric from nessos is actually in the room So he knows much more about the library than I do Inspire it. It's something similar to spark in that like it's it's something dedicated. So I'm going to read
03:23
What they say about it. So embrace is a simple programming model so that you can write So that can do scripting against the cloud and programming programming against the cloud from f-sharp and c-sharp from the scripting environment So it's similar to spark. You can find them on embrace dotnet and also on Twitter
03:40
So the definition was maybe a bit abstract. So we'll see how this works So the first example we're going to work with is like a machine learning example So in general machine learning is about writing a program which is going to learn things from data so that it can perform a task automatically and The goal is that care to write a program such that you give it more data
04:01
It gets better at it and you don't have to change the code. So that's the beauty of it The example I'm going to show should probably be applicable to things which are not machine learning But hey, that's what I do. So that's what I'm going to give you And specifically so this talk is also not specifically about f-sharp But it happens to be what I use so it will be all in f-sharp And one of the reasons I like it for machine learning is that it provides a build built in scripting environment right in Visual Studio
04:25
So that's pretty nice because it's a very convenient way to a flesh out your code to try out ideas have a rapid iterations It's particularly interesting if you do data science or machine learning stuff because what I can do is I can open my scripting environment I can load data and I can start walking
04:42
Against it without reloading my data all the time Which is a huge bonus in time because if I spend my time reloading data Essentially imagine having a build time of five minutes on your code That's what you would get if I have to reload my data all the time So I love the scripting environment for that reason The problem is that you have a bit of a challenge there The scripts are great, but like if I have a tiny data set this works
05:04
If my data set is starting to be terabytes I'm probably not going to be able to load it in my data set or if I have a computation which is really large This is a surface like it's a great machine. But hey, I just have like two cores which are not that awesome I would love to have like 200 cores and stuff like that The scripting environment will not give that to me out of the box. So very quickly
05:24
You're going to hit a point where the machine is going to be your bottleneck And so that's that's what embrace is a helping with the opportunity you can think about also is If you use F sharp or functional language that you can think about something like this whenever you have a map You know that things are parallelizable. And so this is
05:43
This is what I'm going to try to show you is that how you could actually use that pattern recognize a place where you Can parallelize and how by using something like embrace you can start to scale out like crazy your computations so The machine learning example what I'm going to do is simple I'm going to start with what I would do in my local machine So like a plain old plain old code in the script
06:02
And then I'm going to cloudify it meaning like I'm going to show you how easy it is to actually move it to run in the cloud using embrace and Show you how it's very easy to actually use parallelism or exploit parallelism in your code The problem I'm going to use is a classic so it's coming from the Kaggle digit recognizer
06:22
Which is using a classic data set in machine learning. That's the minced data set so what this data set is is a list of essentially like They ask people here's a sheet of paper like you write a number So write a one write to two write to seven and the data set is like 50,000 of these documents And your goal is to write a system which is going to automatically recognize numbers
06:45
and that's what this is and so what I'm going to show here is a Quick and dirty implementation of a simple model, which is called the K nearest neighbor So the K nearest neighbor is like if I want to recognize a number one thing I could do is like hey You gave me an image, and you gave me a sample of 50,000 training samples, so I have an image
07:03
I have the label for 50,000 what I could do is like if you give me a new image. I could say Look at 50,000 images and find me the key images Which are the most similar to it or the least far or the least different so now you get key images K could be one two three. That's your choice and
07:21
What you're going to predict is you're going to predict that the image You're trying to classify is the one with the most labels which came back for instance like your three nearest neighbors I would get like the three most similar image out of the 50,000 And maybe I have two images which have label seven one which is a one I have two seven one one I would predict. This is a seven or did I say one so whichever that was so that's what the model does
07:47
So now is like so what I'm going to what I'm going to do here, so I will already deployed it and The first thing I need to use embrace And if I want to distribute things is actually to have a cluster so I pre deployed it because I didn't completely trust things to work so the
08:03
Actually, let me go back here for a second so the easiest way to actually get started with embrace is to use something called brisk So you can find it at brisk engine.com So it's something which was done by Good guys from ElastiCloud in London and what this does is like it gives you essentially for free the possibility to spin up a cluster
08:20
In like five minutes, so the way it would look is like if I go to a brisk Here I have already my cluster running so she's here. I started it three days ago. That was an accident I forgot to kill it so but it's running and so if I wanted to add a cluster It's as simple as this I would say create a new cluster I can choose whatever I want spark Kafka blah blah blah and what I really want is I can brace and
08:40
Here I would give it a name I would pick what so this is a hooked up to my subscription to azure So what they do is like they actually don't charge you from anything But like if you use 200 machines you will be charged only from for the azure usage That's all you're paid for you're paying for and so here I would create my cluster and the thing which is really awesome is you can choose like how big you want the cluster to be
09:04
So the smallest one would be a could spin up a four course cluster here. So that would cost you like the monstrous sum of like a 22 0.22 pounds an hour If you're like a big spender, you can move to 8 which would be like 44 And so if you're a really big spender, you could go all the way up to 256 cores
09:22
And that would cost you like something under 20 bucks an hour So this is something I really like because for me like this is awesome Like if I want usually I don't need 256 cores all the time I don't have space in my basement to run a farm What I really want is I want to have a huge farm for like two hours run my computation and scrap it This is what I get with a brisk. So like I created my cluster already
09:44
And that's what I'm going to be using And the other thing I'm going to you. Well, I said so let's go to the code So let's move now to a Visual Studio and so what I prepared Is I prepared first and so I'm not going to comment that much about it because the local code is not that important
10:01
But here I created the classifier described So here I have a data set which is my training sample and this is like I didn't use actually 50,000 images. I just used a subset of it. But like this is my data set These are images and this is what I'm going to use to produce predictions and I'm trying to predict a validation set Which is also more images like not super interesting
10:21
And so in my model, so as this classical so here I'm in the scripting environment in F sharp First thing I'm going to do is I'm going to create a couple of types So if I have an image like an image is going to be a collection of ends Which are going to the pixels and I have an example and an example is going to be an image which is pixels and Which is what the image is good then I said like what I wanted is like I need something which is a
10:42
Notion of distance because I need to find out of the 50,000 images, which one is the closest So here my distance is simply like take the two images and for each pixel compute the difference and sum up the differences And so that's going to give me Essentially a big number will tell me that these images are very different and a small number Tells me like these images are very similar. So my distance is done
11:03
I'm now at like 12 lines of code and then is like I'm going to write a predictor And so that predictor is a function which is going to take three things It's going to take a sample which is a collection of examples So these are like my 50,000 or 5,000 in this case images I have key which is how many neighbors do I want to take one two three or ten and I have the image
11:22
You want to classify it? So here what I'm going to do is like take all the images compute the distance and sort it by distance Take them K of them like the three closest Sort and then count by the label So is account how many labels you have find the label with a largest number of things and return the bucket of that label
11:42
So this would be like telling me the label of the image you're trying to classify cool The other thing I need is there so that simple I have my data set here I have a tiny parser which is going to take all my lines and like essentially what I'm doing is like It's a CSV. So am I just breaking it?
12:00
According to the commas and reading my data So no, I'm not super exciting and I'm just going to read my data set. So training training sample up and here we go I read my training set. I'm going to read my test set Done. And so now what I can start doing is like I could do something like I
12:21
Can simply Train my model. So what I'm going to do here is like check how good or how bad my model is And that's very simple I'm going to train a model which is going to be create a predictor using the training set and using one neighbor Take all these items from the testing set and for each of them look at the prediction and if the model predicts something if the
12:40
Something is the same thing as the label of the cart image I'm going to make it a one because that's correct Otherwise, that's false. And if I do the average is going to give me like the percentage correct So this should give me immediately like the how good or how bad my model is. So let's try that
13:01
And so my model is done took like five seconds to compute the evaluation and This so like I have 50 lines of code in here. I have actually a model which works pretty well It's like I got like ninety three point four percentage correctly classify which is nice and the other thing which I like is like as I said, I can start changing my code and
13:21
Like right there in the scripting environment for instance, I could say hey, maybe the distance I'm using like here I'm using the absolute value. I could do something like how about Computing the Euclidean distance so I could do something like this and I could start like playing with alternate models and see how they're doing so I'm just going to do it here and
13:41
Here so I didn't I don't have read to reload the data because the data is already in memory and I can see is That model any better so I can run it again And if I run that guy is like if it's going up it means that the model is better So in that case for instance, I can see immediately that hey, I got ninety four point four percent That's a good idea that type of thing So this is how my day would look like typically with machine learning and that's kind of why I like working in the scripting environment
14:02
Because this frees me. This is a very free flow model to work with data The problem here is like I'm using only 5,000 Images here. That's a fairly small data set. Like this is taking me five seconds if I use like something much bigger Now is like any any time I run this I'm not going to a white five
14:20
So five seconds or three, I'm going to wait potentially half an hour That's not a good place to be because this is completely going to break my flow So and this is the place we're using something like embrace is helpful So now I'm going to first like kill what's happening in my session here and I'm going to go to the file called Cloudification where I'm going to cloudify fix
14:40
Because that's a technical term So the first thing I'm going to do is I'm going to load embrace and show you a bit what embrace does So this is like a new magic. I'm just loading like a bunch of A bunch of dependencies. So this is embrace and so what? What embrace gives me now is like I have this cluster which has spun up like a through brisk
15:01
And what one thing you can do here is like I can do something like this I can connect to my cluster from inside my scripting environment. So I can do something like a runtime runtime dot get handle and Here I'm going to pass config and if I do this, I'm going to get immediately a handle to my cluster
15:23
And so here it is like now I'm connected to the cluster So for instance one thing I could do is I could say hey what's going on in my cluster? So I could add a bunch of things like I could ask to show me for instance like the the workers show workers So what do I have in this cluster? So in this cluster I happen to have I
15:41
Happen to have like four workers running like and so I have like a bunch of information about them Like I have things like so I have a how many machines I have how much of the CPU is used how many cores? I have available all the type of stuff so I can start looking at what's going on In my cluster. The other thing you can look at is like what cluster that show
16:01
Processes so I can also see what work the cluster is doing or what's happened in the cluster in the past And so here what I'm seeing is like all the jobs actually run on this cluster, which is running So I actually have a lot of things here Well, I did lots of things in this cluster. So what I'm going to do is like
16:21
Your processes Yeah I'm going to look here all processes which is going to clear up a bit the list So that it's clean and I'm going to run it again Did I run it? Yeah, okay good so now this is yeah, so I cleaned it up So it's like now this is what I would get I would get things like hey all these jobs were run
16:43
This is when it started. This is how much it took and all of this So so I can see what's happening in my cluster. So that's nice. I have my cluster now What can I do with it? So if I were to do hello world because that's what everybody do I would do in a in a local version I would do something like let hello name In sprint F
17:03
Hello That's the the way you would write a hello world in standard F sharp. So locally and now I could do something like hello and DC whoops and I'm getting you so hello world. So I'm I'm assuming that this is not spectacularly impressive to you guys
17:23
But the so the way I would what I could do now with my cluster is like what I would really want is I would want to run that computation again the cluster and so the cloud hello Would be something like this cloud. Hello is still going to take a name But here the difference is like I'm going to use a computation expression I'm going to talk about it in a second. And so here I will do something like return
17:44
Sprint F as I'm going to do like I'm going to be a Sprint I should probably have them copy paste because this is really the obvious way to code And so so this is like the so the difference is like obviously you notice that there is a cloud with the brackets
18:06
So where this is really this is called technically a computation expression over This is like hey, whatever happens between these brackets is like special code, which is extended with a bit of f-sharp syntax Which is going to perform a few more operations, which are specific To like side effects which are going to use the cloud for you
18:22
But hiding the annoying details of that code So if I do if I do this is like what is going to happen is this is like if I run this This is actually not running But this is giving me it's giving me a function which is going to take in a name Which is a string and give him your cloud string so you can think of it as what's happening with an async Expression is like here. I have something which is ready to run for the cloud
18:42
But I have to give it to somebody to run and so what I would do here to run It is as simple as this like I would do let job Whoops equal and here I'm going to take a cloud Hello NDC and I'm going to pipe that guy into cluster dot run
19:06
And so now just like with an async you give it to the thread pool or somebody else to execute here I'm saying like take this send it up to the cloud Actually, the one thing here the other thing I'm going to do which I forgot is I'm going to add a logger Because that's going to show a bit better what's happening. So I'm going to do cluster
19:23
Attach client logger and I'm going to do console So that we can see a bit what's going on And so now when I send it to run What's going to happen is that you can see here in my session that it's computing the dependencies loading stuff So this is essentially lifting the code which I have for my script
19:42
Sending it over to the cluster running it over there and it came back and now it gave me like a string which is hello NDC so that code instead of running on my machine I can now send pretty much transparently to the cloud and it's going to run on that cluster over there Good The other thing I can do so here I chose to run immediately
20:01
So this is creating like a blocking a blocking call So the other thing I can do which is nice is I can do something like this like Instead of run cluster. I'm going to do Create process and if I create a process what this is going to do is like instead of running and blocking and Waiting for the thing to come back until I can do things
20:20
This is actually going to create a process in the cloud now if I do And and so now I can keep walking like nothing is blocked and I can ask like is the job complete whoops, so, let's see and So this is probably going to tell me yes, because hello world is pretty quick But so normally like if I had a computation you should take like maybe one hour could just send it like under ask like
20:43
Is this done and when it's done, I can just get it back So in this case that the way we do it would be with job dot wait Result and so this would give me back the result of my computation, which is hello in DC So that's kind of in a nutshell like the first construct you get is like this cloud computation expression
21:00
Which allows you to take code from your session and like you just send it for execution to the cloud cool So I'm going to just nuke this because like this is not usually helpful And now what I'm going to do is I'm going to actually do some cloudification So I'm going to take the way I'm going to code the way everybody does it
21:21
So I'm going to do a control a control C copy copy paste Because that's a true and tested way to code and I'm going to paste it here And so what I'm going to do is I'm going to see what it takes for me to actually take that code and now run it In the in the cloud. So if I look at this code, so let's go back up a bit
21:40
This I have no reason to change like this is my domain. This is my domain. This is my domain This is a function like I have no reason to change anything about the way I work. This is my predictor Aha, I see have data. So here Here like I probably don't want to so this is still fine like but if I start to run code in the cluster But my CSV file is on my local machine. That's probably not going to be a very good idea
22:03
So what I probably want to do is I probably want to spend to send the data to the cluster So that my computation are close to it. That's not very difficult So the first thing I'm going to do is like instead of train. I'm going to do something like let's the whoops I'm going to do
22:21
Whoops, let's cloud train And I'm going to use one of the abstraction which is like a cloud file I'm going to let you guess what it is Essentially, it's the same thing as the file you find in system.io but in the cloud So I'm going to do cloud file dot and here I have a bunch of things like create delete blah blah blah and the one I'm interested in is like upload so I can do upload and
22:42
Here it was what was it was who I can say where I want to upload it So this is the second signature so here I'm going to upload it to my cluster and maybe in a subfolder called like NDC 2015 and that's not how I want it I want NDC 2015 and this is going to be like maybe training dot CSV and
23:00
Again, and I probably want to give it something to a Training path. So this is like not very complicated This is going to create a file and if I want to because it's also cloud operation I want to give it to somebody to run so I'm going to run this Locally and so now I'm essentially here
23:21
I'm simply taking my data and just moving it over and of course, I forgot to If I didn't run my code here, nothing is going to work. So sorry about that So I'm going to take everything you have in my script so far And if I do this now I can I should be able to run my cloud train And so this is where I'm going to see how good or how bad uh-huh
23:43
What happened here I do oh, yeah. Thank you see This is where it's useful to have a the guy who wrote the library in the room
24:03
It's like I have like not only do I have the compiler is like I have a human compiler a debugger In the room right now good So exactly what this did now is like it took my file and moved it over So I took a couple of seconds. So now is like it's in the cluster can work with it So I could also do the same thing with cloud validation So I'm going to do also a bit more copy paste because really copy paste is like
24:23
I'm going to call it cloud test. This is going to be no training path, but it's going to be a validation path and This is going to be no training but validation at CSV And so so far I think it's like it's pretty obvious like the the way the code looks is actually extremely similar To what I would do locally. So like that's that's good. So now is like I don't really need my local file anymore
24:47
Actually, yeah, I'm going to a nuke this because I'm not read I'm not going to read anything locally And so the the bigger piece where obviously need to change things is like this guy Which is about which is about computing how many correct?
25:00
Predictions I had but now instead of running it locally. I want to run it in the cloud So hey, what I'm going to do is I'm going to do If I want it to be in the cloud, I'm going to put it in the cloud So the way I'm going to do it is simply like put it in the cloud and I'm going to put completely at the end of this. I'm going to close this bracket
25:21
So now is like I'm going to have to change a couple of things So now is like I want to open my data not locally but in the cloud So what I'm going to do is I'm going to do let's Let let's say cloud train How did I call this guy like yeah, that's fine. I'm going to do cloud file Dot reader and I'm going to actually read all lines. So that's really fairly similar
25:42
to what and so this is what I'm going to use here is Was it like cloud Cloud train so this is a cloud file so I can do cloud train that path so I'm going to read my file That's done I'm going to do the same thing with the cloud test
26:06
So this is going to be a cloud test. I'm going to read here not cloud train but cloud test And now what I can do is I can do something like let train
26:24
equal Cloud train so pretty much exactly the same code I had before so I'm going to drop the headers and I'm going to do so. This is an array so I can do array dot map and I think this was porcelain and I can do let test
26:46
So so far is like the code I'm writing is really like a clue not exactly one-to-one, but pretty much exactly what I had On my local machine, but this is all going to execute like in the cloud. So porcelain again, so that's it So now is the only thing I need to do is like I'm going to train a model like nothing changes
27:03
Is that right? Yeah, that's good. I'm going to try to train my predictor and so now I can do exactly the same thing here, but I'm going to do like let let's say let result going to run this So I need to indent it a bit and I'm going to because I'm in the cloud world
27:23
I'm going to return this and this is it. So it's like at that point is like I'm actually I took my code So it's like it's pretty much the same thing I had before but like this is all going to run in the cluster So I'm just going to run it So this is a cloud float because like it's something which is waiting to be executed
27:44
So now what I can do is I can do a correct and I'm going to to do cluster whoops cluster dot create process and I should probably call this a job so that I can actually do something like this. So let job And here so now my job is started like going up to the cloud and I can ask like is the job complete
28:09
Not complete yet, so I can do something like let's look at the jobs like a show processes And so what I should see now is like actually it's completed so it was sent to the cloud like I can see
28:23
I can see when it started all of this and my thing is actually completed so that I can I can do a now Job, I can get the result. So I'm going to do this job wait result and This should give me like 94.4 which is exactly what I got on my local machine. So So Hopefully like this the point I was trying to make here is like really all I had to do was a cloud blah blah blah
28:44
Do pretty much the same thing as before I can read a file seamlessly I can move my data like pretty easy over there and what I get is like So the benefit I'm getting here is pretty minimal because really what I did was I had something which was running pretty fast on My machine and I'm running it on one machine on the cloud
29:01
I'm paying a price for this realization, but I can already use a bigger machine now the piece which is interesting Is that what I could do throw is like I could start to parallelize this and so this is where it's interesting Because the next question I would be interested in would be now I tried with k equal 1 How about k equal 2 k equal 3 k equal 4 like I want to know out of all the models I could run which one is the best possible one
29:22
And so here if I wanted to do it on my local machine I would have to run one wait for it to come back from the other one when the other one this is going to Be very painful what I could do with with a embrace is I could do exactly the same thing as before Except that you have a construct which is called cloud that parallel so instead of spinning up one task I can say hey create all these tasks run them on the cluster and come back to me when you're done
29:43
And so I'm not going to comment much on this piece of code here, but what this is doing is same thing I'm reading my data. I'm training a model, but here. I'm taking k equals 1 to 10 and I'm training your predictor with k equals 1 2 3 blah blah blah, and I'm going to send them all to my cluster Let the cluster do the job and come back to me when you're done, so I'm going to run this guy
30:07
And so now first is like first because this was not a train file. It was a cloud Cloud train and this was a cloud test
30:22
So if I were so I'm going to run this now So now I'm getting like yeah, so again like it's what this is going to give me is back an int and float So it's like 4k equal 1 this is a quality and so on and so forth so first like let me run Let me send this to the cluster so like let para equal
30:45
Parallelized and I'm going to send it whoops Dot and this is going to be create process And so now I'm really sending like 10 times more walk
31:00
I was like first like this is where I'm happy that he's going to run in the cloud because I can just send it Like and wait for it Wait for it to be to be done, so I'm going to check again like cluster get processes So let's see Not get processes show processes What's going on in my cluster?
31:22
So I have a job so this one is a bit of a bigger job, but like it's running there So let's ask like is my job finished para dot completed But normally at that point. I would just run it I can go have all it's done I would go and ask for a cup of coffee or something like that and come when the job is done later
31:42
So this is a completed so now what I could do is I could do something Like this I could say let results equal power so I can now await the results to retrieve to ask the cluster so what are the results and
32:01
Because like this is a bit unexciting is like so this is giving me that for all the all the cases like How good or how bad the model is and so instead of doing that what I could do is I could actually do something like this results chart dot maybe columns So let's put this
32:21
And so this is like not massively exciting But so now what I managed to do is I made you send walk to the the cluster Run like in a couple of seconds like not one But like ten different models comes back and now I can start inspecting like how good or how bad this is And so this is showing you if you have like very very good eyesight This would show you like for k equals three or four like this is the best model you have and I could run
32:41
It like 250 100 like so this is freeing me a lot from running a pretty big computation without being bogged down On my local machine, so that's kind of like the first benefit. I get out of embrace so I'm going to go back to my slides for a minute and So the so what what I hope to show here is like first like you're also creating a cluster is not difficult
33:02
I'm going to talk later about the starter kit But like what I can do is I can very easily from my scripting environment Like send computation to the cluster when I need big computation power If I have parallelism like in this case, I had a map for k equal one two Blah-blah-blah-blah-blah, I can just send it and do cloud parallel and it's just going to distribute it magically for me
33:21
Which is awesome because now I have 16 cores on the cluster if I wanted 256 I could and I could also easily send files and use files in a way Which is extremely similar to the way you use a local file With a in dotnet and all it took was cloud blah blah blah on a cluster the truck question so far
33:42
Yes Yes This is a great question but I will reserve the right to postponing the answer to after the session because this would bring us to a Too long of a discussion like the short version is like it's kind of an unusual. It's a unusual
34:03
It's usual syntax when you use a computation expression. So this takes more like looking at how a computation expression works Does this is this fair enough? So so I did a bit of a say big compute thing So now what I want to do is I want to do a bit of data processing
34:21
Example so the challenge which is kind of a general problem. We all have is like how do I how would I go to handle a big data set and if you think about the previous example This is where the previous example had a bit of a problem because what I had was I had a local CSV on my Machine that's nice. But like that was possible because the file was small if I have like terabytes
34:41
I will certainly not do like cloud file upload From my local machine because I won't be able to put the file in my local machine in the first place And this is a fairly common task like I do this because of machine learning But I'm sure you guys do look passing do all these things like a all of us like work with files And so my goal here would be like if I can't put it if I can't put the data on my local machine
35:01
My goal would be really can I use all that power? I have in the cluster without ever having the data touching my local machine That's really what I would like so that I can have the data wherever I want I can just like call it from the cluster and run my computations from over there And the other thing I'm going to show here is that so far I have shown only f-sharp code
35:22
Embrace actually allows you to use also C sharp. Like that's one of the beauties of f-sharp is like f-sharp is a Is a is also a dotnet language and so f-sharp and C sharp you can talk to each other So if you have like code in C sharp, you can use it. That's just fine So I'm going to show how to do that So, of course I'm going to because it's like a big big data ish thing is like in any
35:43
In any top in any talk on the topic is that you're going to have a what count because hey That's what everybody does and so I'm sorry about this The at the same time is I saw when I told the friend like Phil that I would use a what can he told me? No, no, no, don't use a what can this is horrible And so I thought like okay that can maybe I can take a what count but and I'm not going to do what can't
36:02
I'm going to do a sexy what counted so my secondary challenge was can I actually do something neat with a what count and So my goal here will be to take all the data from the Guardian and it's not big Actually, like the Guardian like the Guardian is a great newspaper. But the reason I used it is not because it's great It's just because they they happen to have a public API. So that was very convenient
36:20
And so what I'm going to do is I'm going to try to grab 2014 Guardian headlines and Create a visualization of everything which happened in 2014 So that's hopefully going to be a bit of a sexy what count and if not as I can sorry So let's get to the code
36:44
Good so First I'm going to go to the file called sexy what count because that's apt I'm going to load some stuff and I'm going to quickly So the first thing I want is like if I want to use If I if I want to grab the data from the Guardian is like you saw
37:01
If actually let me show you this like the the Guardian API is like you good old classic API so it's it's actually really nice like it has an explorer so you can run queries and of course It's a returning JSON to you so That type of stuff so because I'm an F sharp person my first thought is I have data Maybe I can use a type provider to retrieve that data
37:22
So what I'm going to do here is I'm going to use a library which is called The which is called a F sharp data, which is like a collection of a type providers And so here I just grabbed like the URL to the API So what this is is like it's like a search for the section called world with a test API
37:41
Like so if I run this actually, I'm going to show you what you get so that Well, like So this is like the test API and so if I do this what they should get is I guess should get the latest Headlines from the yeah, so this is not very sexy, but like this is like the the JSON response
38:03
I'm getting from the Guardian like this is the 10 latest headlines published on the Guardian So what a type provider allows me to you is like it allows me to use this This is a sample URL. And so what I can do is I can say take a Crater type which is going to be headlines and I'm going to be chosen
38:20
provider dot Jason provider from the sample URL Yes, good. And if I do that, this is going to essentially look at the URL Look at the recent response and create a type for me. And so right now is like my entire passing is done so for instance, I can do something like
38:41
headlines dot get sample So I'm going to retrieve data and just see what came back here. So let's do let sample equals whoops and This is so now this is going to call the Guardian and I can start doing things like sample dot and So this is telling me here you get JSON values you get a response
39:00
So let's do response dot and this is giving me like your current page or the by blah blah blah I'm getting results. So maybe I could do something like results and I could do something like seek I could let's say fun X I could for instance do something like print the headlines I go back
39:21
So here X is also a fully typed object I go back so I've got something which has an ID Which has like a web title which looks like a great thing I might want to show so let's do this and So this is what I got so like the point here is like it has really nothing to do with embrace Specifically but like typewriters are just like massively awesome is that in literally four lines like I managed to hook up to an API
39:42
Call it get a fully typed access To the JSON data which was coming back and I can just start working with it. I Get types I get discoverability. So I'm a happy camper So now that I know that I need the data is like that's great But what I would really want is I would want to move that data to the cloud And use it. So actually this isn't so let's do that
40:04
To delete this because this is not useful. So I built like quickly like a DSL ish thing So I'm not even going to comment on this But I built a DSL so that I could actually call That URL and do things like this like what I really wanted to do was to say
40:20
Hey I want to run a query from that date to that date and Give me that much pages and so I get like this nice little API which is say like Get from the date to the date and so I could do things like for instance call Call the guardian maybe from a date time Whoops the time dot now
40:41
To date time now Maybe give me like 17 pages because why not and Here this is going to call it and so is like now I do so I got my response back So like short to the short point is like I can easy write a DSL and start to grab my data back good so now that I have this is what I really want is I want to move this to the cloud so that
41:02
Remember like the goal was to have data without ever touching my local machine So I don't want to grab this on a CSV here. I want to do it up in the cloud So I'm going to of course load up embrace. I'm going to connect to my cluster and now
41:22
So what I can what I can start doing is like I can start doing the same type of thing So I'm going to change my path here to something which is going to be NDC 2015 just to show you like this is a fresh one so I'm going to what I The way we do it is like pretty much the way I did it before is like if I don't want to touch
41:41
My local file is like I can simply do this in the cloud I'm going to create a sequence start in 2014 one like January 1st increment the day from one by one run the query Whoops, sorry run the query Grab like 50 headlines at the time And grab the data and save it in a cloud file. So this is all going to be happening outside of my machine
42:04
So here I'm doing on something which is not big data Like it's just one year of headlines, but you could do this with pretty much whatever data source you want And so that's nice because I'm not depending on my local machine here I also have like this horrible call here, which you might have noticed So I'm going to run this first Like the reason I have this call in the middle is because I have a throttle and the guardian API
42:24
And so I can't actually make a crazy amount of calls per second But like now is like I'm going to just run it and right now this is just going to go Go its merry way Call the Guardian and start to create a file for me like behind the scenes and I can just go grab my cup of coffee and that that type of thing So
42:41
So yeah, is this running this is where probably I should probably have started a process instead of a blocking call but This was a yeah, this was this was a bit of a mistake here So I'm waiting for it to come back like normally it should take like Something like 20 seconds because of the actually because of the throttle on the on the Guardian
43:05
Well, this is this was perfectly timed so that they could get a sip of water Yeah, and the process has been created. So now it should be coming fast
43:30
I do apologize for this like this was exactly the wrong moment to make a blocking call
43:40
Here we go, okay, so I I don't think I have timed it exactly but like actually I can see it from here so this Like it took me Yeah, this took a bit and I see the the part of the process but essentially like 20 seconds Like now is like I got my file. I grabbed like one year of headlines for from the Guardian great
44:00
So the now if I want to work with it Obviously, I don't want to download the file to my local machine So another abstraction which comes in with in in embrace is what they call the cloud flow so the way you can think about this is think about something like a sequence in a shop like like Or think about maybe link expressions which allow you to walk against sequences
44:21
So in this case is like I have headlines So one thing I might want to do for instance is like how many lines do I have in this file I created? And so the the way I could do that is like simple like this I could do I could do cloud flow Maybe from so off. Sorry off cloud file by line because I can have a cloud file here
44:41
I think I called this file. How did I call this file? Whoops I Didn't call it so is like I'm actually going to call it so I'm going to do Let's okay, let's let's let file You call cloud file. So this is the file. I just created right? Yes
45:09
So I'm not following you what you Actually, I can pass the path here is what you're saying right like Yeah, I can I can do that. That would be also easier because that's the path I got so it's like hey
45:21
Let's do that. So now here what I can do is I can start creating a flow from this So think of it as a sequence which is going to look at my file line by line And so I can do things like hey, this is a cloud flow So I can start piping these things together and I have operations which look like pretty similar to what I would expect in a sequence So I have things like from sequence to sequence I have average I have count by have group by so I have all these familiar link
45:43
Constructs where I can start working on my remote file on my sequence in a smart way So here what I want is I want to get the length so I'm going to run this and so let's see how many headlines I pulled and This is uh-huh. Uh-huh. What is this?
46:04
So I'm going to cheat for a second and like I have my cheat sheet right below Because I think my time is running a bit Actually, this should work. Oh, yeah, okay So the the way I could for instance do this is like I could do something that cloud file take the file path I got I get the path
46:21
Take five of them put it to our run So here I would get like back five lines so I can start exploring files exactly the same way I would walk with the sequence or with the link expression locally Except that here again like the data never came to my local machine like everything went Happened on the cluster so I can ask questions Let the cloud flow do the right thing and come back to me with the answers
46:40
For instance, I could do things which are like maybe not super interesting, but I could say grab that file and tell me So grab that file again and look at the line length And like tell me give me like the count of the headlines which are like above 50 So because like that's really something I care about and so this is going to again run like do a nice things and tell
47:01
Me hey, you have like eight thousand eight hundred like headlines with more than 50 characters. So that type of thing. So this is cool And so this gives you like a pretty comfortable way to work in a familiar way Against like data or against sequences the one thing I haven't shown also which is cool is like I can do this Which is cloud flow
47:22
flow dots and so to see that it's smart like it's not only doing things sequentially but I can do like with a Degree of parallelism and so it's not only is this a sequence but it's really a smart sequence So is that here what I'm saying is like hey make a sequence out of these headlines and process it in a smart way like parallelizing on
47:41
16 machines or 16 cores so it's like it's not only a sequence is like it's a small sequence which is going to distribute and parallelize when it can So that's what the cloud flow does and so out of this So now what I want is I want to use this to do what count and to do this is like one thing I did was like hey Maybe I have some C sharp code which I want to use here because I'm not going to rewrite everything in F sharp
48:01
And so what I did is like I wrote a thoroughly an exciting piece of code in C shop But that's this so is like I wrote an analyzer And that analyzer what it is doing is like it it has a class which is called the work counter The what count has a word which is a string and has a count which is an end until this is a word count Yeah And I have like two methods which are wrote also in C sharp
48:22
Like I have one which is if I give you a string of text I can count like you'll be I'm using regular expressions because everybody loves this and I'm going to return to you like all the distinct words I have in a string why not and I have another one which is going to return all the counts for Strings are found in so nothing super exciting
48:41
But like this is essentially doing what count on the string and this is a good old C sharp code And so I can actually completely use this like from C sharp first from an F shop first So I'm going to just load this here. So this is my DLL like the text analyzer and If I run this like on on this so I can get the individual words So I'm in F shop here, right, but I can use my C sharp code. Why not?
49:03
So this works and if Like if you knew somebody who would say something like developers developers developers, you would get to what count of three times developers So that's like C sharp code consumed from the F sharp side. And so now what I can completely do Is so yeah, like I have my guardian file. So this is a bit of a gnarly code
49:21
so but essentially what I'm going to do here is I'm going to Take my file on the cloud and run this through the here you see the code like analyzer the towards Actually, yeah, that's just on the 10 first headline so I can do that I can do what count on like the first 10 headlines why not and What I should get back is I should get back
49:47
Yeah, so I got like my distinct words. So I have like 10 minutes left, right? So, okay So so what I can do with this stuff is like I essentially I can start using my C sharp code So let me see if I try to run it. So
50:02
It's a bit of a longer expression but this whole thing here is going to count the for instance the individual words in the guardian stuff So this was like 10,000 headlines. I'm sending it to the cluster and this is going to go through it You know what is like I'm actually not going to run the code because normally so last time I run it It's like it run into about 10 seconds, but like given my luck and given the internet connection
50:24
I don't want to be blocked waiting for this to come back But so the the higher level point I want to make here is that with the cloud flow I could write in a Couple of lines like a word count which is going to run blazing fast like without blocking my machine and the full word count would Be here so I could do the same thing, which is going to compute the word count
50:45
The compute the word count The what counts across the days so so let me let me try to Actually did it yes, so So so with this is like I would get to my what come back and
51:03
I did this already so what I said is like what I wanted to do was really The the visually did I run it or not? I'm trying to think if I want to dare to run that code or not Like I think I will actually not run it just in case I block so is like I'm going to shorten a bit
51:21
like that part here So if I if I to high level like if I started to run the what count like I would see something Which is a bit silly like So remember my goal is like to visualize like a year of headlines or a year of 2014 day by day if I look just at the rule what counts what I'm going to see is like the What do you think the most common one is it's going to be probably and you're probably going to get the?
51:45
on All sorts of words which are not very interesting and so if I start to plot like day by day Something which shows me the most frequent words you're going to see something which is extremely and interesting So I did another level of processing using a technique from from like Text-processing called TF IDF so the idea of TF IDF is that if you look at the world?
52:04
You want to look at two things like how frequent the word is in a day like if I see a word a lot It's probably important But if I see that word every day like and it's probably not that important and so TF IDF is essentially doing something like look At the words which were frequent that day and pretty much divided by how frequent it is in general
52:21
So hand and for instance will disappear because like it's in every single document on the other hand like other words Will appear very rarely, but when they appear like will pop up So that's what I did and that's what I run and so I'm going to simply so I did this processing like which was not Very complicated and so I'm going to show you the result
52:40
So this is probably the worst WPF code I wrote in my life So I'm not going to show it But I'm simply going to run it like this is what I ended up with so I took like one year of headlines From the Guardian and I created an animation of the of the words every day So this is I'm going wait to take a little step back and wait for it to kick off
53:01
And so this is 2014 using TF idea So you can see the calendar. It's going like day by day So we're already in the March and you need to see somebody like forage pop-up bit because like it no And I have run it like when I wrote this code is like I run it like a couple of times It's like there is something kind of mesmerizing
53:22
To watch this The one which caught my attention is like I think it's in March or April There is a headline about a giraffe in Copenhagen. I have to check exactly what happened with giraffes in Denmark
53:41
Is this what happened like a giraffe was fed to it was it in Denmark or somewhere else See, I think like this is great because like first like the visualization showed me the giraffe in Denmark It is like it showed you like it actually got me an information which are highly specific And so it's like that was a of course it popped like the most important information of 2014 popped with this one
54:05
That's really a good story So yeah, so that's kind of so that's that was kind of the result of my my visualization here So I'm going to go back to to this and so the I really want to maybe go to the conclusion and on on this so
54:24
The first conclusion is like I think What embrace gives you is like the if there's one thing to remember is that it's it gives you like your construct It's a simple and extreme powerful So what I managed to do is like essentially from a standard F sharp code It should be read all lines from a path and map it I can transform it with just two things one of them is cloud blah blah blah same code
54:43
And the other one is like instead of a file I'm going to use a cloud file and I can also use a cloud flow instead of a sequence So you have actually construct its map pretty closely to the way you would code Naturally on your local environment except that now I can run it against the cloud at whatever Using whatever power I want to use
55:01
So simple and powerful model. The other lesson here is that I had C sharp code I had no problem to run my C sharp code So it's like it's not only about F shop really what you need To run this is that you need to know a tiny bit about the F sharp scripting environment and you need to know basic things Like cloud blah blah blah let and once you have that is that you can run whatever code you want
55:22
for that matter So the nice thing is that I can still work in the scripting environment because that's what I like But now is like instead of getting two cores I can get like two hundred fifty six fifty six cores and still work from a scripting Environment, I'm a happy camper about that The part which is a interesting as well as I can so I work in the scripting environment
55:40
But like you could also use this from a standard application if you wanted for if you wanted for instance to have Say a batch job every day run at midnight and pass through gigantic logs You could just spin up a cluster in your application Run it and scrap the cluster when you're done And so that would be perfect because now you can get on demand the power you need The so that's all the things I like the one thing which is maybe liking is a embrace is a developing fast
56:07
The team is actually awesome and their response say I didn't say maybe that it's entirely open source But the team is awesome at responding to requests and all of this at the same time It's still a bit young and so is like compared to something like spark. Maybe you have less high-level libraries So that's just for the for the sake of a I guess honesty is that you won't have like maybe the same breadth of ecosystem
56:26
That being said knowing the f-sharp community speed I would not be surprised if like in a couple of months you had some high-level libraries on embrace as well Shameless plug is like I have a book coming out. You can ask me about it later So this is me. You can reach me out on
56:42
This is my email. This is my Twitter handle There is a quiz at the end of this presentation you have three answers, it's green orange or red the correct answer is green So don't forget to fill the evaluations and that's what I got. So thank you. And if you have questions, I came here
57:04
Questions no questions. So I have I have used it only on Azure. I believe you can run it on your local cluster as well It just like for me like the practical thing was a
57:21
Because embrace actually existed before brisk is like for me the thing which was practical is like I Hate it like don't tell anybody better. So I Just was just too lazy to provision things and what I like is like I don't even have to bother about it I can just say hey give me a cluster and the cluster is there but like I believe it's perfectly possible to run it On your machine and on your network
57:44
Well, thank you
Recommendations
Series of 2 media