Develop and deploy a Machine Learning pipeline in 30 minutes with Ploomber
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 115 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/58760 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202180 / 115
1
3
19
25
31
34
36
38
41
42
44
46
48
52
60
61
62
65
69
73
76
82
83
84
92
94
96
103
110
113
00:00
Software developerGoogolText editorOpen sourceCross-validation (statistics)Right angleWave packetComputer filePatch (Unix)Frame problemSelf-organizationSoftware maintenanceLaptopPoint cloudSet (mathematics)Moving averageConfiguration spaceRaw image formatProjective planeEndliche ModelltheorieDemo (music)Front and back endsDifferent (Kate Ryan album)Inheritance (object-oriented programming)Server (computing)2 (number)StapeldateiIntegrated development environmentCodeSystem callMathematicsFunction (mathematics)MereologyPredictabilityDampingStructural loadVariable (mathematics)Task (computing)PlotterWritingLogicScripting languageUniform resource locatorFilter <Stochastik>PreprocessorPoint (geometry)Repository (publishing)Data structureWordBuildingEvoluteClosed setLink (knot theory)TwitterExpert systemVirtual machineVideo gameHeegaard splittingForestMusical ensembleObject-oriented programmingElectric generatorProduct (business)Table (information)Subject indexing1 (number)Cellular automatonPresentation of a groupObject (grammar)State observerIRIS-TCentralizer and normalizerFile formatSampling (statistics)Level (video gaming)Medical imagingParameter (computer programming)Process (computing)Source codeMachine codeTouchscreenSoftware testingMultiplication signBitMatrix (mathematics)Slide ruleControl flowClient (computing)Asynchronous Transfer ModeFitness functionoutputElectronic mailing listFigurate numberOrder (biology)Router (computing)Classical physicsBinary multiplierAngleComplete metric spacePlug-in (computing)Drop (liquid)Streaming mediaOpen setQuicksortINTEGRALInformationMeeting/Interview
00:25
TouchscreenPerfect groupVirtual machineMeeting/Interview
01:11
Boom (sailing)Link (knot theory)TwitterGame theoryDemo (music)Installation artThermodynamisches SystemTask (computing)LaptopCodeComputer fileProjective planeSource codeExpert systemDemo (music)Link (knot theory)Centralizer and normalizerTwitterComputer clusterVirtual machinePresentation of a groupPerfect groupScripting languageFunction (mathematics)
04:00
Demo (music)Product (business)Gamma functionWorld Wide Web ConsortiumThermodynamisches SystemInstallation artGastropod shellACIDTask (computing)Product (business)IRIS-TFile formatSource codeINTEGRALLogicComputer fileRight anglePlotterEndliche ModelltheorieSource code
05:45
Demo (music)Product (business)CodeTask (computing)Cellular automatonScripting languageSource codeData modelMagneto-optical driveMeta elementClique-widthPrice indexCore dumpElectric generatorModemModule (mathematics)Lattice (order)OvalInterior (topology)Musical ensembleDrop (liquid)Curve fittingTape driveAreaPlot (narrative)Matrix (mathematics)Parameter (computer programming)Task (computing)2 (number)Multiplication signPredictabilityMatrix (mathematics)Fitness functionSystem callEndliche ModelltheorieLaptopInformationIntrusion detection systemFunction (mathematics)Frame problemRight angleSet (mathematics)Scripting languageProduct (business)Subject indexingComputer fileCodeStructural loadVirtual machineCross-validation (statistics)MereologyHeegaard splittingForestPerformance appraisalOrder (biology)outputVariable (mathematics)Software maintenanceSelf-organizationBinary multiplierClassical physicsPlug-in (computing)Interactive televisionINTEGRALComputer animation
13:32
DampingDemo (music)Task (computing)Cellular automatonLemma (mathematics)World Wide Web ConsortiumData modelSigma-algebraMeta elementProduct (business)ExplosionACIDComputer fileData modelLaptop2 (number)Order (biology)
14:11
VacuumLattice (order)Plot (narrative)Product (business)Musical ensembleDemo (music)Data modelMagneto-optical driveBoom (sailing)PredictionMatrix (mathematics)Endliche ModelltheorieEqualiser (mathematics)Product (business)2 (number)Function (mathematics)Computer animation
15:21
Product (business)Data modelMeta elementDemo (music)Plot (narrative)World Wide Web ConsortiumComa BerenicesAnnulus (mathematics)Keilförmige AnordnungFile Transfer ProtocolGamma functionIRIS-TTask (computing)Gastropod shellBuildingWave packetParameter (computer programming)CodeDemo (music)Computer fileEndliche ModelltheorieLogicDampingScripting languagePredictabilityFunction (mathematics)System callTask (computing)MathematicsPlotterPreprocessorPoint (geometry)Different (Kate Ryan album)
21:13
Formal languageTask (computing)CodeProduct (business)Maxima and minimaModemMagneto-optical drivePredictabilityVariable (mathematics)CodeVirtual machineTask (computing)State observerEndliche ModelltheorieStructural loadSet (mathematics)Sampling (statistics)Computer animation
22:14
Process (computing)Task (computing)Product (business)Cellular automatonStreaming mediaComputer fileGauge theoryMach's principleMenu (computing)Formal grammarMetropolitan area networkMIDIMaxima and minimaSystem callInformation managementLine (geometry)Demo (music)File formatCore dumpCodeAsynchronous Transfer ModePrice indexEmailData storage deviceComputer configurationDecimalFunction (mathematics)Source codeComputer animation
22:42
Hill differential equationAsynchronous Transfer ModeDirectory serviceDemo (music)File formatMagneto-optical driveProduct (business)Read-only memoryCodeData storage devicePredictabilityPlateau's problemFormal languageTask (computing)Extension (kinesiology)Computer fileTask (computing)CodeServer (computing)Computer animationSource code
23:24
Menu (computing)Least squaresVulnerability (computing)Structural loadService-oriented architectureTask (computing)Product (business)Formal languageParameter (computer programming)Extension (kinesiology)Process (computing)Demo (music)ExistenceExplosion1 (number)Scripting languagePredictabilityBuildingSource codeComputer animation
23:57
Embedded systemProduct (business)ExistenceTask (computing)Serial portDemo (music)Data modelMetreCartesian coordinate systemEmpennageExplosionCodeCellular automatonMeta elementMagneto-optical driveTask (computing)MathematicsSource codeComputer animation
24:28
Demo (music)Task (computing)Information managementProduct (business)Data modelStatisticsCellular automatonProcess (computing)ExistenceExplosionFinite element methodLattice (order)Menu (computing)Magneto-optical driveMeta elementBoom (sailing)Streaming mediaModemMIDISystem callPredictionEndliche ModelltheorieObject (grammar)Computer filePredictabilityVariable (mathematics)Structural loadPoint cloudMachine codeDampingTask (computing)Product (business)Frame problemSubject indexingComputer animation
26:19
StapeldateiTask (computing)Game theoryMaxima and minimaGamma functionComputer configurationPlanar graphExpert systemProduct (business)Moving averageGEDCOMCodeDuality (mathematics)SummierbarkeitSource codeMassoutputSkewnessTask (computing)StapeldateiIntegrated development environmentDifferent (Kate Ryan album)Server (computing)Configuration spaceFront and back endsLocal ringMedical imaging2 (number)Endliche ModelltheorieMereologyClient (computing)Function (mathematics)Set (mathematics)Uniform resource locatorRepository (publishing)Point cloudMultiplication signComputer file1 (number)Scripting languageProcess (computing)Computer animationSource code
30:44
GEDCOMOnline chatInteractive televisionBitMatrix (mathematics)Control flowPresentation of a groupSource codeLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:06
So, I guess we are going for one ad or directly to the next speaker. Well, it's 4.15, 16.15. So, let's go to the next speaker, Eduardo on the stage, please.
00:27
All right. Thank you. Thank you, Consuelo. Hello, Eduardo. Hi. So, you're the next speaker up. Where are you connecting from? From Mexico City. Nice. Nice. Nice place. Very nice. I was there many, many years ago.
00:46
So, you're going to talk about develop and deploy a machine learning pipeline in 30 minutes with... Bloomberg. How do you pronounce your... Ah, okay. Perfect. So, you can start sharing your screen if you have slides anytime.
01:04
Okay. Let me try. Can you see my screen? Now, yes. Perfect. So, I'll disappear. You have 30 minutes and take it away.
01:21
Great. Thank you. Welcome, everyone. Thanks for being here at my presentation. My name is Eduardo and I'm going to be showing a demo of a project I've been working on, Plumber. So, the talk is going to be develop and deploy a machine learning pipeline in 30 minutes with Plumber. So, I'm going to be coding. I'm going to be coding as fast as I can,
01:42
trying to explain as many details as I can. But bear in mind that this presentation, the objective of this presentation is not for you to be an expert in Plumber, but rather to get a glimpse of how the experience looks like so you can use it for your next project. So, before we start with the demo, I want to show a few things. Otherwise,
02:02
I'm going to forget this by the end of the presentation. So, just a few things. The project is open source, so you can check out the code on GitHub. Here is the link. If you like the project, please show your support with the star on GitHub. Please also join our community if you have any questions or just want to chat. The link is on the GitHubs with me.
02:23
Or you can also reach out to me on Twitter. So, here's my handle. Okay. Let's start. The first thing that we are going to do is we are going to create a base project. So, I'm going to run the first command, which is Plumber Scaffold. We're going to be using Conda for my dependencies. I can also use pip. And we are going to create an empty project.
02:46
So, that's the first step. We are going to create a base project. Just the final step we need to get started, I'm going to call this demo. Then we go to the demo folder. And I'm going to start explaining how this pipeline thing looks like. So, a pipeline is just a
03:01
bunch of tasks. Right? We get some data, we clean some data, we generate some features, we train a model. And we usually split this into many small steps so that we can modularize our pipeline. So, the central piece in Plumber is this pipeline.yaml file where we declare our tasks. So, that's what I'm going to do now. I am going to create my first task.
03:22
I'm going to say source. That's where my source code is. I'm going to store this in a scripts folder. And I'm going to say get.i. So, this script is going to get some data. Just the raw data that we need. And this is going to generate two outputs. So, I say products. And then the first one is going to be a notebook. Why a notebook? That's because
03:42
Bloomberg treats scripts as notebooks. So, we can develop them interactively, but then we can execute them from the command line and we can get an output notebook. The idea is that if our script generates any kinds of charts or tables, once we execute the pipeline, we are going to be able to get all of these in a file that we can take a look
04:02
at. So, I'll do this. I'll say products.get.ipynb. I can also change the format. For example, I can say HTML, but I'll leave it as ipynb. This is also going to generate some data. So, products.get. This is where I want to save my data. And that's for our first
04:23
task. Now, let's continue with the next task. We're going to be using the Iris dataset. So, let's, I'm going to generate a feature from the sepal columns. So, I'll just call the sepal feature. Same idea, source code and products. Let's continue with the next task, which I'm going to be using the petal columns. Same thing. And finally, we train a model. So,
04:50
I'm going to call this fit. I'm going to change this because this doesn't generate data. This is going to be a model. So, I'm going to change the name and say model.pico. So, now we have a basic structure, the basic layout. Now, I'm going to ask Bloomberg to
05:05
generate some basic files for me. I made a mistake. Yes, this should be, yes. Okay,
05:20
let's try again. Right. So, now we have the base files and I can generate a plot from this. So, we see that Bloomberg is recognizing these files as our tasks. So, you see, we have four tasks. This doesn't have any structure yet. So, that's what we are going to be working on and I'm going to show the integration with Jupyter. So, I'm going to open
05:40
JupyterLab and we are going to start calling the logic for our pipeline. Okay, let's give it a few seconds. Okay. So, now let's go to the first task. So, I'm going to be getting some data. As you can see, this is something important to mention. I have my pipeline.jaml here
06:03
and as you can see, get generates two outputs, right? So, Bloomberg is auto-completing that for me and telling me, this is where you are supposed to save your output. So, I can simply run this L and I have the information that I need. I'm going to import, this is where I'm going to be getting my data from, import load IDs. So, it's import, yes. The integration with Jupyter
06:32
is really nice because it allows me to do these kinds of things like doing things interactively. It just makes things much easier than just using a script or a regular script. And remember that
06:44
this is a regular script. It just happens that Bloomberg has a plugin that allows us to open them as notebooks. We rely on the Jupyter, the fantastic Jupyter package and we just add a bunch of things on top of it to make this work. So, I think I need this frame. Yeah, this contains everything I need. So, I'll just save this with csv and then I'm going to use a variable
07:06
that Bloomberg adds for me. So, product.data and I don't want to save the index. So, I'll say index false. Okay, so that's it for our first task. Now, let's continue with the next one. So, simple feature and I'm going to show something interesting here. So, we are going to generate
07:24
a feature but we depend on the raw data to do so. So, what I'm going to do is I use this special upstream variable and say I want to use get as a dependency. So, I save my file and reload. And you see that Plumber is going to auto complete things for me. So, I have my output where I'm
07:43
supposed to save my output and where's my input. So, I continue working and let me import pandas and I'm going to read my raw data. So, the data that I generated in the previous task. This is going to be upstream get. Okay, now I have my raw data. This is where I'm going to
08:08
generate one feature. So, it's going to be a really simple feature. So, it's going to be let's call this petalfeater and say this is going to be equal to df. Let's just take this one. I'm just doing the classic feature engineering step and multiply by the other one.
08:30
A really simple thing just for the sake of example. What's going on here? Oh, this one is extra. Okay, and now I got my new column and I'm only going to save this one
08:43
because I already have the rest of the columns. I'm just going to save this one. CSV, same thing. So, I use the variable that Plumber auto completes product because that's where I should save my output. Okay, so we finished the second task. Now, let's move to
09:02
the second, sorry the third one. Repeat that again and the code is going to be really similar. So, just to save some time I'm just going to copy a few things here. Oh, first I have to declare my dependencies. So, let's reload. Okay, now let's add this new feature. So,
09:24
it's going to be really, really similar. I'll just copy this thing. I just have to change something here. All right, I skipped one important step which is loading my data,
09:45
my raw data. Here, yes. So, we load the raw data. We generate the feature and we save it. Let's just make a quick check. Everything looks good. Okay, so now we finished our
10:06
third step. Let's go to the final task which is fitting the model. Oops, actually. Oh, I kind of overlooked this important detail. So, these are .py scripts. In order to open them
10:21
as notebooks, I have to double click and then open as notebook. Now, this final step uses all previous tasks as inputs. So, I'm going to make a list. So, I'm going to use sepal feature, et al feature and the get task. Okay, so these are my dependencies. Now, I am going to reload
10:41
this. You can see I get everything I need and let's work on our machine learning model. Let's load the raw data. And then we say upstream. The raw data is here.
11:04
So, we see our raw data. And now, let's load the features that we generated. So, let's start with sepal. So, as you can see, this auto-completion and all these things allows us to really break down this. What usually happens is that people
11:23
code notebooks like really long notebooks and it becomes a real mess. So, in this way, we are breaking down this huge notebook and we have many small files that we can concatenate one with another and this helps a lot with organization and maintainability and we can also
11:41
collaborate with people because people may work on different files without any issues. Okay, so I have everything I need. I'm going to create one data frame with everything. So, let's call this df and this has everything that I've been working on, right? So, this is my training set. We have the raw data. We have the features that I generated and we have the target
12:03
variable. So, let's now train a model. Just a random forest. Okay. And just to show some charts
12:23
on evaluation charts, I'm going to create a confusion matrix. Okay. Now, we have our data. Let's split this into x and y. So, let's drop the target, axis, columns, and then y is going to be
12:46
df.target. All right. So, we have x and y. Let's train our model. So, this is going to be the random forest. Now, let's go fit x, y. I'm going to skip the cross-validation part
13:01
just to save some time and to be quick. But in real life, you should be doing cross-validation to evaluate your models. So, don't do this, please, in a real machine learning project. I'm going to generate predictions in my training set. Predict confusion matrix
13:21
and then we need y and y frame. Right. So, we have our evaluation. So, we finished the Jupyter notebook. Now, I've run things interactively, but I want to make sure that my pipeline runs from start to finish for reproducibility. So, what I'm going to do
13:42
is I'm going to ask Bloomberg to run everything for me from start to finish. And you're going to see that it's going to run things in order. So, you can see here, it's getting the data. Then, it's going to generate the first feature, then the second feature, and finally, it's going to train a model. So, we are making sure, oh, I forgot something important.
14:05
Yes. I didn't save the model. I just trained a model, but I didn't save it. So, let's go back to JupyterLab and fix that. Give a few seconds. Okay. So, let's come back here.
14:23
And this doesn't take too long to run, so I'm just going to run everything. Okay. So, here's where we have to save our model. We see here that we declare a model as an output. So, we have to do that. That's why Bloomberg was complaining,
14:41
because it's saying, well, you told me you were going to save something, and I don't see it. So, tell me what it is. What else? Oh, I need import equal. And now, I have my model, and I'm going to save this in product. I think it's model. And write bytes. And then,
15:13
people.bunks. All right. So, now, we can close this. I'm going to show this can help us to show
15:26
some nice features from Bloomberg, because I can call this plumber build command again. So, I already built most of my pipeline. I run get, and the two tasks that generate features. So, if I call this again, check out what's going to happen. So, it's only running fit,
15:44
because I already have the outputs for the other tasks, and I haven't changed anything, so it can skip tasks that haven't changed since the last run. So, for example, if I run this again, it's not going to do anything, because I haven't done anything. So, it helps you to iterate faster on your pipeline. Okay. So, we finished the turning pipeline. Let's work on
16:04
the serving pipeline. I'm going to show the new plot. Now that we established the relationships between the tasks, I can see these new charts. So, before, we had a plot without any structure. Now, we are saying we are getting some data, we generate some features, and we join
16:20
everything to train a model. Now, I want to generate a serving pipeline, and the only difference between this training pipeline and my serving pipeline is what happens at the beginning and at the end. When we are training a model, we want to get historical data. We process it, and then we train a model. When we want to make predictions, we are going to get new data. So,
16:42
all the new data points that we want to make predictions on, we have to apply the same pre-processing to generate the same features, and we are going to load up a model and make predictions. So, as you can see, what happens here in the middle is the same thing. So, I'm going to use that fact and reuse this code, so I don't have to compute my feature
17:00
generating code twice. So, that's what I'm going to do now. What I'm going to do is I'm going to create a new file that separates what's common to both pipelines. So, I'm going to call this features channel, and then I'm going to create another file where I'm going to be declaring my serving logic. So, let's go back to the training pipeline. I'm going to take out these two tasks
17:24
which generate the features. So, I'm going to put them here. This is going to be common to both pipelines. So, I put this here, and now to fix my training pipeline, I'm going to import that file. So, I do import tasks from say, features.jaml. Okay. So, that's it. Now,
17:45
our serving pipeline, I'm going to reuse our training pipeline as a base, and I'm just going to make a few changes. So, for our saving pipeline, instead of getting historical data, we need to get new data. So, I'm going to create a new script called get new. I have to
18:06
be compatible with the rest of the code. What else we have to do? I have to change this. So, this is not going to be training.jaml. This is going to be making predictions. I'm going to do predict, and this is going to be a data file with the prediction. So, I'll call this predict.
18:22
Okay. So, now I'm going to parameterize these two pipelines because when I run the training pipeline, I want to save the output in one folder, and when running the serving pipeline, I want to save the files in a different folder. So, I create a new file to parameterize my pipelines, and I'm going to say out train. So, my training pipeline is going to save its output
18:45
here. And for our serving pipeline, I'm going to create that. And out's going to be serve. Now, I want to, I need to parameterize my pipeline. What I'm going to do is I'm going
19:01
to change the path to the output files, and I'm going to include that parameter that I just created. Okay. So, I know this is really fast. I'm skipping lots of details, but just want to get some idea of how the experience looks like. So, I parameterized my pipeline, my training pipeline. Now, I have to do the same thing with my serving pipeline. Out. And finally,
19:26
this other file. Okay. Out. I'm going to test this thing. Oh, I missed something. I have to include my model. So, when serving a, when serving predictions, we have to load our model. So,
19:45
what I'm going to do is I'm going to say my model is in this folder in a file called model.pico. And this has to be a parameter. So, params model. Now, let's get our pickle file, the one that we generated when we ran our training pipeline. I'm just going to copy that.
20:03
I'm going to delete the rest of this because I want to show you how, now that we parameterized our pipelines, I can run plumber build again. This is going to run the training pipeline, and you are going to see that it's going to save everything in the train folder because we parameterized the pipeline. So, it's running everything from
20:21
scratch again. It's training a new model. Now that it finished, I'm going to do the same for the serving pipeline. We want to test that we can actually serve, oh, actually, I'm skipping a really important step, which is coding the logic for the training, for the serving pipeline. So, I'm just going to do that. I have to tell plumber that it has to use the serving pipeline instead of the training one. So, I'm just going to do this.
20:49
Now, I'm going to generate the base files right now because I don't have anything, I don't have this file or this file. So, that's what I'm going to do now. I'm going to call plumber scaffold and then use my serve demo file. Okay. So, we got those two files. We
21:05
can see them here. And now, let's go to JupyterLab. So, we code the logic that gets new data and the one that loads the model and makes predictions. So, just for simplicity,
21:20
I'm going to be loading the same data. It's not going to be new data because this sample data set is limited. So, again, please don't do this in a real machine learning project. This is just to make an example of how this works. In a real project, we would be getting new observations that we want to make predictions on. So, I just copied the code from the other task because
21:44
this is going to be really similar. So, let's assume this command load iris gives us new data and we want to make predictions on this. I have to change this because we don't want the target variable. So, you can see we only have the raw data and we want to make predictions on this. So, I'm going to save this. I didn't. Oh.
22:08
Why is this not auto completing things for me? All right. Let me see what's going on. Oh, I see what happened. This shouldn't be new. This should be good. All right. Let's see if
22:35
my output. Oh, I don't have the output folder. I can actually use the command line to
22:44
ask Plumber to run my new code. So, it generates that folder. I am missing the serve folder. So, it cannot save that file because I only have train. But just to show how the command line works, I'm going to say Plumber task and then get. So, I want to run this task
23:04
and use my pipeline serve. So, this is going to run this file from the command line and it's going to create that folder for me. All right. So, we finished that. We can ignore this and let's continue working. So, we are going to reuse the previous code. So, these two files,
23:25
the ones that generate features, this one and this one. So, I can simply run my pipeline because we already declared that in our serving logic. So, I'm going to do Plumber build. This, of course, is going to break at the end because I don't have the script that makes a
23:43
prediction. I have to work on that now. So, I am just going to generate the features. Now, I have the features that I need and I can continue working on my final step. I can show that I have the serve folder. All right. So, we have this is generated by the serve pipeline.
24:04
Okay. So, now let's continue working on this. We need everything from the previous tasks. Now, let's reload this thing. And I am going to borrow some of the code from here, not from here.
24:28
From here. Just to make things a little fast. I need this. I'm just going to make a few changes here. Actually, I think I thought we could change anything. So, we are going to generate
24:47
the features and then we are going to load the model and make predictions. So, you see, we have all the features. We don't have the target variable because this is the task that loads and predicts. Now, let's load our model from path leave, import path, and import pico.
25:04
So, we have the path to our model here. Let's load that at model. I think it's model. Yes, model. Create bytes. And we need pico loads. This is going to return the object.
25:26
Okay. So, we load our model. We are going to make predictions now. I think it's called here. Yeah. Spreads. And now let's just create a data frame with this.
25:40
So, that we can save this as a CSV file. Okay. So, let's assume these are the predictions that we want to generate. Now, we save this. Cxbeam. Product data. Yes. Index false. Okay.
26:02
So, we finished coding. Let's make sure that our serving pipeline actually runs from scratch before we deploy this to the cloud. Okay. It's working. Great. We finished working on the Jupyter cloud so we can close this. Let me shut this down. And now, so, we finish
26:25
with the coding part. We don't need this anymore, the output. We just need the models. I'm just going to delete it. Now, we use the second command line tool. So, Plumber takes care of helps you write pipelines locally. And if you want to run things in the cloud, you can use
26:43
the second command line tool that we're going to use now. So, what I'm going to do is I'm going to create a new deployment environment. So, I'm going to use the supervisor add. I'm going to call this serve. And we are going to use the AWS batch backend. We can also use Airflow or Kubernetes. The experience is pretty much the same. The only difference is
27:06
the configuration. It's pretty much the same. Oh, I'm missing two files that I need. My dependencies. So, what I'm going to do is I'm going to get those files. I need
27:20
these three. Just configuration files that I need. My credentials for the S3 bucket and my dependencies. That's why it didn't work. The command didn't work. So, now, okay. So, now it worked. And we have this new file, which is where we are going to be configuring, setting the configuration for the execution in the cloud.
27:43
So, this is AWS batch settings. You can ignore the details here if you are not interested in AWS batch. These settings change if you change the backend. So, I need to get a copy of my repository URL. So, I have that here. Great. So, that's, those are my settings. I finished
28:06
configuring this thing. Now, there's one remaining piece here. Because when we run things in the cloud, we are going to be running each task. So, each of these scripts in a different container
28:22
that are completely isolated. So, if we run one task that depends on a previous task, we need to pass or transfer the data. So, what we use is that we use an S3 bucket. And I have to configure a client. So, I need to add a new file to configure the client. And I say
28:42
clients.get. And I'm going to create that file now. Clients.py. And now, I have to configure my S3 bucket. So, S3 client. Let's get return S3. Timber bucket. Now, the folder, I'll say
29:04
hello from Python. And my credentials are in credential.json. Okay. So, I think we're done.
29:20
Let's check out if this configuration works. So, plumber status. And check our serving pipeline. And if this configuration can successfully connect to S3, which it happened, it means that we are ready to deploy. So, what I'm going to do now is I do supervisor export
29:42
and the name of my environment, which is serve. Okay. Let's run this. So, it's loading my pipeline. And now, it's going to make sure that it actually works. It's creating the Docker image. It's pretty fast because I already generated a base image. So, it's only adding the new code, but it's not installing dependencies because it already has
30:05
those things just to make this faster. Now, it verified that the pipeline works, that you can import it. Then it checked the configuration with the S3 bucket. It pushed the image and it submitted the jobs. So, that's it. We deploy our pipeline. It's
30:29
ignore this. These are just the practice that I did last night just to make sure that I was able to do this in 30 minutes. So, you can see the new tasks here. These are the ones that we just submitted to AWS. And that's it. We finish. We finish on time.
30:58
All right, Eduardo. Thank you so much. Great talk. Great tool. Great presentation.
31:05
Live coding, everything. I think people also in the matrix were really impressed. We have, well, we can cut a little bit into the break right now. So, I will ask you
31:21
one question, one very quick question that people had in the chat. And can Bloomberg be used without Jupyter notebooks? Yes, yes, you can. You can use it without Jupyter.
31:40
I have a really strong preference for Jupyter because it allows me to do things interactively. But if you like a text editor, of course, you can use the tool that you prefer. Fantastic. Great. Thank you so much, Eduardo. Folks, let's thank all our speakers again
32:01
for this session. I will do the chat clapping myself.