We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Statistical methods in global air pollution modelling part 2 - convolut'l neural networks

00:00

Formale Metadaten

Titel
Statistical methods in global air pollution modelling part 2 - convolut'l neural networks
Serientitel
Anzahl der Teile
27
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produzent
Produktionsjahr2020
ProduktionsortWicc, Wageningen International Congress Centre B.V.

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
High-resolution global air pollution mapping has significant social and academic impacts, but is a tremendously challenging task especially in terms of data assimilation and analytics. In this workshop, I will introduce most recent status in global air pollution modelling and evolvement in data (from social science, Earth observations, numerical models), with a focus on explaining various machine learning algorithms (e.g. ensemble trees, deep convolutional neural networks) and overfitting-controlling strategies (e.g. regularization, post-processing), and how they could contribute to global air pollution mapping.
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Transkript: Englisch(automatisch erzeugt)
For example, here I'm showing the primary road within 25 meter buffer, the total length of it, and in the southern buffer. But the question then is, what buffer should be used? And also, with this type of method, we drop lots of other information from the transportation networks,
so like the corners and the curvatures, the traffic lights, and roundabouts, and that may all contribute to the air pollution process. So can we automatically extract features from the transportation networks? So that is what the deep learning-based method
are good at. So you just fit in the raw data, and it will automatically learn features and assemble features. So here's an example, a very concrete example, the facial recognition. So how do you identify if there is a face in the image?
So a human worker is going to look at eyes, or mouth, or nose, but then the question is, how do you identify an eye or a mouth? So it's just getting more and more complex. But deep learning neural networks excel at this work. So the amazing thing of it is that it learns
the features hierarchically. So if you look at the output at each layer, at the lower levels, you see these edges and oriented lines. And then go to higher levels, you started to see eyes, and nose, and shapes, and so on.
And then even higher level, you started to see like jaw lines inside of your face and until the face emerged. So I'm going to, again, we'll start from some basics. So the basic element of, or the basic unit of deep learning
is a neuron, which is called perception. So in the first step, you take the weighted sum of the input and add a bias to it. And this bias view can show when the neuron is activated. And then this number will pass to a nonlinear function,
which is commonly called, which is called the activation function. So this gives the output. So there are many types of activation functions. And here I'm showing two that are the most commonly used. So sigma takes one divided by one minus e
at the negative of the input and takes its shape. And it's bounded between zero and one. So it's mostly used when we have probability as output. And this one is the rally shape,
the negative values to zero and maintains all the positive values. And then we can take different weights of bias and that forms the different units or the neurons at each layer and the output of each layer can be an input of next layer.
So we have a neural network and between the input and output layers, we have hidden, the layers are called hidden layers. This one is, so this type of neural network is called densely connected neural network because every neurons are connected to each other.
So the big question is again, how do we optimize the loss, the cost function? So here it's, the cost function is the mean of all the, of the loss function of each, the loss of each observation. For the regression problems, like what we had with the global mapping,
we take the mean squared error. And for classification, the cost function can be more diverse and one of them is cost entropy. So, and then how do we, and how do we do it? So this gradient descent is probably at the core
of the mathematics and the neural network. So if you remember from your calculus class, you might remember that sometimes you can solve it explicitly. So like in your simple linear regression, if your cost function is this simple, you can just calculate the derivative
and it's equal to zero. But this deep neural network, the cost function can be very complex. So, and we also may have thousands or even millions of parameters to solve. So you can't solve it explicitly. And the better strategy is gradient descent.
So this is similar to the gradient boosting where we are trying to look around and find the direction that the cost function descent the fastest. And this is by calculating the gradient and all the derivative, so the derivative.
So here it shows actually if the slope is negative, then we can move our parameter to the right and the slope is negative and we shape our parameter to the left. And again, we multiply by a learning rate
that decide our step size. So how far, how quick we want to descent. And this, we need to do it for every sample. So we need to take the sum, we need to calculate the gradient for each of the sample and the sum all over them.
And so if you, so for, so that means for each step of gradient descent, we need to scan all over the data and the data set can be very big, like gigabytes or terabytes. So in practice, what we do is to take a mini batch of the sample
to calculate the gradient and then the next time, another mini batch to update the gradient. So by doing it this way, we don't directly descent our gradient to the minimum, but take a more zigzagging way, but it's much faster.
So do you have any problems with, or questions with the gradient descent, doing it this way? Is everything clear or? So yeah, when I just started, I was worried about what if the, it's stuck at the local minimum.
So what we are looking for is a global minimum, where we descent to the lowest, but as the cost function can be very complex, will you be stuck there at the local minimum? Have you stopped by this? Any question? Isn't that why you kind of scheduled a living rate?
No, yes, but that's a very good point. Yeah, so the answer was, if I try to schedule the learning rate, so that means to have the dynamic learning rate. So for example, when I have more iterations,
then I go slower to search for the gradient. But if you do it this way, you can still stuck in the local minimum. No other comments? But that's a very good point.
So the answer is actually we are not going to. So what I showed you just now was a one dimensional case, where you definitely see the local minimum. Actually in two dimensional case, it's also very easy to imagine this local minimum.
But when we have very high dimensional case, like the neural network, actually what we are going to have is a subtle point. So it's very hard to think to imagine multiple dimensions, but you can think about each dimension independently.
So if you think about each dimension, then your cost function is either going to be this convex shape, so facing upward or this concave shape facing downwards. So you can think the possibility that you're going to have all of these dimensions
with this convex shape is actually very, very small. So in your 10 dimensional case, for example, the probability is like 0.01 already. So in reality, what we are going to have is a combination of this concave and convex shape. So that forms a subtle point.
So that means we always have a direction to go down and the minimum we found is the global minimum already. So the subtle point name for the name come from is because it looks like a saddle on a horse. Okay, so the regularization is, again,
the most important thing, because the neural network, when you fit it, it can really, you can really fit everything. You can call the training arrow zero. So how to control overfitting is the most important task. And again, the region last so pass,
we already, as we introduce, it's maybe easier to think this time because we have all these piecewise linear regression. So we actually try to control the overfitting by, again, penalizing large weights or large coefficients. And another is the early stopping,
which is also similar with the XG boost. So if the neural network doesn't improve as we wanted to, as we expected, then we just stop iterating. And there are two, and dropout and batch normalization are specific for a neural network.
So dropout just mean to dropout connections. So the way some bias you learn, you just drop it. And the batch normalization means to initialize the input at every layer, at the layer, to normalize the input at each layer.
So that means to have the input as mean zero and standard deviation one. This is because if you, because we have these neural networks that pass numbers and functions, if you have a small change in your previous layer, this change can be magnified in your later layers.
And it's very hard for your model to optimize. And with batch normalization, you can even use higher learning rates sometime because the model is much easier for it to optimize.
And now how can we apply it to structured data, like an image or spatial temporal cube or multispectral image? So if we, so the most rudimentary way is to convert it into a one-dimensional array and then fit it into our densely connected neural network.
But it's not impossible to do it. Actually, it may still learn some features and give you some results, but it's definitely not the most efficient way because all the data structure destroyed. So for an image, for example,
the pixels that have closed together and far away together, they are taken the same way. So remember we have this everything densely connected. So we are not really make a distinguish between that. So better way is to use the convolutional filters.
The filter is a matrix, it's a weight matrix. And it applies to your input. And this time, instead of taking all the pixels to do the calculation,
like to put it in your dense mechanical layer, you just take the pixels that are covered by the filter. And then we are going to weight it some, those pixels that are covered by the filter based on the weight of the filter.
It is clear. And then this filter is going to move all the way over the pixel. And then we are going to have an output layer. And when we have multiple channels, so like RGB image, we have three channels.
Also multispectral image where you have multiple channels. We can apply the same filter to each of the channel and then aggregate them together. And then again, we can add a bias to it. And then this layer is called a convolutional layer. Yeah, clear.
So we can also have different filters. So same idea that we can have different weight for each neuron, we can have different filters. And then that will give different channels of your output. So the number of your channels of the output are the same as the number of the filters you use.
Yes. And the application of convolutional layer is actually not new. It's actually in the image processing since 20, 30 years ago. And so convolutional filters, they can, but what is new is that they automatically
extract features now. So I have this small R scripts here called Convolutional Illustrated. So for this, if you're not familiar with the image processing, you can see how different convolutional filter works. So you can see some of them sharpen the image,
smooth the image, and some of them use tracked edges or features of the image. So the convolutional neural network, again, there are lots of architectures. And the one I'm going to introduce today and also is used in the practice is called ResNet.
So the full name is the ResidualNet. It's actually, so why we want it? It's because when we grow a neural network, we want it to go deep. We want to go as deep as it could because then it can better capture the hidden features.
The more deep, the more complex features you might be able to use track. But the problem is it's not always increased model performance when you add more layers. And sometimes they can be even harmful because of the gradient,
because then you have problems in designing the gradient. So what this ResNet does is to use skip layers. So what it does is that it adds the layer to the layer that is a few steps ahead.
So if x is the x after a few layers, and then we add them together. And by doing it this way, the layers that are in between, if they are not useful, the weight that I learned is just going to be zero. So that means we are just going to treat them as they don't exist.
So with this kind of network, you can go very deep because the layers we add to the network won't harm our neural network. Yes, so it's a very, it's quite clever strategy. Maybe it sounds quite simple,
but when it was proposed, it's really the most, it has been the most popular network and generated lots of studies. So you can see there are also variants of the ResNet. So actually trying to know when to do the addition of the layers.
If you do it before or after batch normalization, or before or after the activation and so on. So there are lots of studies of it. And so in the scripts, I implemented two versions, kind of official versions that they think that performs the best in their experiments.
So you can also compare the results of different versions of ResNets. So in our study, we use the ResNets to automatically extract features from the transportation network. And then we use the densely connected neural network
to model other predictors like population and climate elevation and so on. And then we concatenated the output of these two, we concatenated these two models together. And then it goes through another dense layer
to do the prediction. So this way when you do the back propagation, so back propagation is the gradient, is to use gradient descent to find the weight and bias. You can propagate through both paths. Yes, am I too fast?
I hope not. Do everyone understand? To complicate the subject, following the main path. Okay, great. So now we can go to details with the practicals. So, Kaggle, it started with machine learning competition
and now it's becoming an open public data platform for machine learning practitioners. So basically you can share your data set and scripts together,
and other people can work on your data sets or try with your scripts. So here actually you can also see the computation data. So very, a lot of resource.
If you win some computation, you can definitely, yeah. Not only money, but you can also, it's really a quite noble thing. Now, yeah, but yeah.
But we are actually now trying to publish a computation there. So it's not about air pollution, I think, but about the urban building extraction. So you want to, so the buildings, you know, want to try to use that to,
want to map like unequal buildings. So the buildings of different morphologies and so on. So how to maybe automatically classify different buildings. So, it's not related to this one, but I'm just saying we are trying
to put a computation on it using satellite imagery and then to extract buildings from it and classify different types of buildings. Based on the morphology, morphology shape. Yeah, you can use all kinds of features.
Actually, oh, that's a very good point. Yeah, that's what we think may be the most promising one. But of course you develop the method to win the computation. Yeah, okay, yeah.
If you're interested, we can talk more about the computation, but we are still trying to prepare something. Okay, so have you all opened the link I just shared?
You have, you entered the notebook. So from here you can see the data. So you can also do it yourself later, of course, to load your own data. So here I, so this is a CSV file
of the ground station measurements. Oh, sorry. Well, where should I? There's some help here. It's not moving. Oh, maybe, no, I think I moved.
Sorry. Okay, great. Yeah, so you're all in this. Are you all here? Great.
So here you can, if you want to work on your own project, you can load data. But here are the data, where are the colors? But this is the data I used. So the ground station measurements and other background variables in the CSV file.
And this is for making prediction. Oh, no, these are all the array files. So the transportation now works. And here I still separate the two different types. So primary roads, the first is the primary roads.
The first is the highway and second primary roads. And third, secondary roads. The first is the 30-way road and the fifth, the local road. Okay, and they are all stored as non-py arrays. So it's the same as, it's just, you can think it has R array,
but they're just arrays. Yeah, so the first step is to install all the packages. So as it's already, as it has a Docker, it runs Docker and it's back. So many packages are already pre-installed.
So we can just simply import it. So the import here is like library. And eyes is just to give it a new name so that you can just say, when you call this new name.
Oh, yeah, the question is, if I do it locally, do I need Docker? Yes, it's all, well, you can use it, of course, yeah.
Yeah, well, to do it in Docker is, one thing is it's reproducible easily. And for this Conda, you can, you know you can use Conda environment.
So that's also very efficient way to have everything together. Yeah. Oh, yeah. Okay, that's great to learn. So what Docker do you use? You use a TensorFlow Docker.
It's just that installing TensorFlow on your computer can sometimes be very tricky because of all kinds of dependencies and you really have to pick your own drivers for your graphics card. And it's a hassle. And that's why a lot of people just use Docker because then it just works for you. Okay. Because of the GPU, right?
The TensorFlow at that scale sometimes. Yeah, it's just like Docker is what most people think is easiest. Okay. Yeah, so the comment is to use Docker when you see TensorFlow is easier.
Okay, so we already imported all the packages. So here I think Chris also introduced in the morning that this Keras is a higher level API of the TensorFlow that works on all these metrics and arrays that do the deep learning work.
So this Keras really have lots of higher level functions that simplify things. Here this import is just to make it easier when you're calling functions. Okay, here is just a setting, the model setting.
The batch size is the mini-batch when we design our gradient. You remember with the mini-batch gradient design. The epochs here, the number of iteration we want to do when we run the neural network. And this N is about the number,
how deep we want our neural network to be in the ResNet. So later I'll have another, I draw a figure for it. So maybe easier for you to understand. And the version is for, we have one, two, so for two versions of ResNet.
And data augmentation is, so it's one strategy when we are training the neural network. So data augmentation is like, for example, to rotate data or enhance data. So that we have more training samples. So it's one technique to boosting the data
for better training. And this subtracted so mean is you don't need to pay attention to it. It just, if you want to normalize data. Okay, so here are the versions.
So for different versions of ResNet, that's how it's calculated, it's a bit different. You also need to ask a skipper for now. And here, now we load all the empire race. Yeah, you all done that?
Well, actually, I can't find the data when I copy the data from the data and then you see the data update. That's not actually the entry. Do you understand that one?
You need to copy the data on the right side. You need to copy the data from the right side. Yeah, copy the data and then you just say, is that a big record to open it? I think you can directly open the link, can't you, without copy.
The copy and edit should also work. Yes. You all register, right? Unless that works. Yes.
This first copy and edit. Now, let's just edit my copy because I did that. A recent set.
Check it in, there is some information for you. It says, but it doesn't produce again from multiple data sources, so when I click on it, it says no data sources. So maybe the data is not shared? Is that possible?
No, it's not shared. I tried it myself, but you can see this copy, but... Maybe, so, I don't know, if you give me your email address,
I can share with your project and then... Maybe that's it, but it is already public, so we should... Search for data sets in Python. No, I should be with the project already, but maybe what we can try,
but this is a problem with everyone. Yeah. But then I have to make it private.
I forgot, how do I share? Because I always make it public. Oh yeah, but maybe you can try to give me your... Yeah.
Are there anyone? Did they see?
Okay.
Okay. Yeah, you can upload that.
I think probably the best way to share the data with us is to publish it as a dataset. I published it as a dataset. And you can start carrying it, because then we can search for it and we can add it to the project.
Do you need a shame? You also didn't receive anything from your email.
So if you just... I think you have to search for this dataset. They are already shared. So the global 55, or 5221, and the road 64, I think it's in the data.
Oh yeah. Okay, yeah, but maybe now you can search it again. I just make it public. The global 5221.
If it doesn't work, I just go on. Let's just watch you do it. Okay. Yeah, sorry about that. I couldn't say much.
Okay, so we are here. So here you can load the data. So the MP array. So you can just think of it as array. And also I just made data binary.
So the place means values. So values is larger than one. I just convert it to one, otherwise zero. And it's also very important to pay attention to which dimension is your channel.
And which dimension is your number of samples. Because then the TensorFlow needs to understand it. So it has to... This one has to match what the TensorFlow thinks. So the TensorFlow thinks that the last channel,
the last dimension is the channel. Then you also have to have the last dimension of channel. Yeah, you understand? So this is what this move axis does. So that reshaped my data. So to move the channel, this five of the channel by the way, 64 by 64, the dimension of an image.
And this is a number of samples, 5,000. Here we can also have a look of the data. So just...
Great, so Xiaoyi, do you want to type in your email here? Maybe you can just come and type your email in there.
So I'm sorry for that. Does it send already or?
Oh, yes, you can. I'm going to...
Yeah, thank you. I sent you the same number of emails.
Yes, you're good to go. No, I guess it's going to be safe then. Ah, okay, I sent the email here.
Okay, thank you. Do you want to? I already click. I click the save once already. Maybe you can not.
I'm not stuck.
Oh, you have, okay. Okay, so. Okay, thank you. Oh, you only see one, right?
Okay, I'm going to make the other also open. Maybe you can see.
Okay, do you see the second data? Maybe you need to refresh.
I just made all the data set public. So you should see all of them now.
Is that okay for you? Yes, I wanted to have all of them public. I don't know why they're not. Oh, yeah, I think it's because of the open geo hub computation. Because we might have this computation. So I thought the data set might be slightly related.
So I made things close again. Okay, well that is...
Pardon? Yes. Yes.
Okay, so can you try to run it now? Okay, that's good.
Yes, it's a bit slow, I think.
So that's good, you can do it. And it takes a while to update.
What should I do for you?
Zoom. Yeah, I think if you...
Ah, here. There's some people. Ah, somebody wants to... Oh, okay. Yeah. Okay.
Okay, so maybe you can try, but with the scripts, I'm going to hide another person.
So how can you do this? Delete. Delete. Okay, thank you. Okay, let's go back. So here I'm just having a double check. So I'm splitting the data set into test and training.
So, this X train V, X test, Y train V, X train V test, the data and the corresponding arrays.
And this X train RF and X test RF thing, they are only to split the data frame. So for fitting or densely connecting your network to the background variables. And then here, I'm just checking if the dimensions are all right, everything's all right.
Okay, and this LR schedule is what he just brought up to schedule the learning rate. So basically, we want the learning rate to be relatively high when we just started. And then when we are approaching to the minimum,
we want the learning rate to be lower so you can search more carefully. So here, it's actually just basically saying if there are more than 180 iterations, then multiply it with this number and then just to let it go lower and lower
at the learning rate. Okay, and this point is the densely connected neural network. So Keras is actually making it really easy to do the modeling. It's also very easy to understand.
So in the beginning, you just initialize the sequence sequential model, and then you can add convolutional layers or just neural densely. So here's the densely connected neural network. Yeah, so here it says,
how do you want to initialize your kernel? And you can usually just say to initiate it randomly. And then you can use different regularizations. And then we can, after that,
we can do the batch normalization layer and the value layer, so the activation layer. So you can just do this to construct your neural network. You are all clear with the batch normalization and ready now.
And then you can also try to add more layers. So add more dense batch normalization activation layers. Here you actually, here is the number of units. So the number of weights and the different weights, different combination of weights and bias.
Okay, and then for the regression problem, usually the last layer, we use the linear activation. And it's because we have a single output. So here's just one.
And then we can run this densely connected neural network. So you can see it's fitting. So why we have 2,076 here is because we have data in the batch.
So this is the number of batches, the 264. Here you can see the loss and mean spirit error and so on. And also how are they on the validation data set.
So you can see it's decreasing and especially this loss function on the, this loss on the training set is really decreasing rapidly. But then the loss on the test data set,
it's decreasing slowly, slower and slower in the end. But this here is not doing modeling yet. It's just to test our neural network. Here this one. And now we can look at the rice net.
So rice net, I think to help you understand, yesterday I drew something. So hey, and the high rating is not so good problem,
but I think it can help. So it has a few components. The first is the rice layer. So just because if you do it this way, then you're, because your neural network can be very complex. So you want to structure it a bit. So the first thing is the rice layer.
And that consists of the three layers. Firstly, we do a convolution. We do a convolution and then we have a batch norm layer and then the activation. And then we have rice blocks. So the rice block, the input you go through two rice layers.
So this is the version one of the rice net. Those three, two rice layers. And then I take back two eggs. So the output of these two rice layers is going to be added together with this eggs. And this is for the first block.
And when you do your second block, this X is also going to go through one dimensional, going to go through a convolution on your network using one by one filters. So this is to ensure that X and Y have the same dimensions. Because when you want to aggregate two arrays,
you need them to have the same dimensions. And then this will go to another activation. And then you do it three times. So this is repeated three times, one after another. And the first time we have 16 filters and then the second time, 32 and the last time, 64 filters.
So this is this N. So this N is the number of blocks. So remember in the beginning, we have a three. So that means we will have 20 core convolution layers because each block has two and then we have three blocks.
So it's N plus, N multiplied by six. And then we have two at the third and second and third block. So we have all together 20. So that is the ResNet-20. And that's where its name come from.
We also have ResNet-152. So then your network is really deep. You can try different N to see then your network is very deep if you can increase the performance. Okay, so with this, hopefully the scripts can be easier to understand.
Maybe I'll give you one minute to rate the scripts. This ResNet, the ResNet layer and then the ResNet V1 is for the first version.
I have this figure here. And then you just let me know if it's not so clear.
And it's the script, easy to understand, or if you have questions for certain parameters.
Oh, it's actually, it's the sequence of doing batch normalization and convolutional layers.
Yeah. And yeah, well, that's where the other people changed. So these two networks actually are quite official. So many people already tested them.
Yes, I tried two different versions. Here you're also going to try different versions. So remember to, remember we have in the beginning the modal setting. So if you set the version to one, then it's for the, to run the V1.
And if you set this to two, it's to run the V2. And you all know what is strides and pides,
piding and strides. So the- I still don't actually have access to the files, so I'm just holding it. I do have access, but it won't recognize it. So the files are listed in Java, but you can use a Python script to run it.
Oh, it's because of the path problem. Yeah, I'm pretty sure the path is true.
Do you think we should try this? Yeah. Okay, I think maybe we should try this. Maybe what? No, I mean, we can change the language. We can change the type of language.
So that's nice. Well, this is, this is stupid.
I don't think I'm going to be able to solve this, so just keep going. Sorry, let me- This is too weird. I'm going to, I'm going to share with you the path,
and then you just copy and paste it, and see if it's because of your language.
If you share something with me. No, I'm sharing it with you. Well, I'm thinking about sending this to you. And you're, you know, about something. Yeah, and the talk is still there. It seems like you're sending it to me.
So I'm going to send it to you. Yeah, I'm going to send you to this one. This one, too. This one, too. Oh, here. Here? Yeah.
That one doesn't even come. OK. No. Make me go back.
Here's the other one.
OK. So after that, you can just run the model.
We can just try to. So here you see the model structure. What is very handy is this figure. So you can see the model structure.
See what you do and then the dimensions. It has question mark here because to this step we are not running the model so it doesn't
know the input size. It doesn't know how many samples we are putting in. But you can already see the structure of the model. So even just 20 layers is already quite long, but it's very clear what you're doing with
this thing. So you see that after one convolutional layer, you do a batch normalization and then the activation and then this is going to go through a set of rest layers.
And also go through a one-dimensional convolutional, one by one filter convolutional layer and then add it together. And then you just do it.
And then to the last, so this is the background information. So here are the background information which is just passed into a densely connected neural network which is very similar to the artificial neural network if you're familiar with it.
And then you concatenate this thing together. So just to put these two things together and then it goes through another dense layer and then gives the output. So you can have a look at this to see what we are doing with this neural network.
And here is just to do different data augmentation.
So you can see for example if you want to flip the image or if you want to zoom it to rotate it in a way because this image they're just, they're not going to, because this image they are just adding more samples to the data.
And the more samples we have, the easier it is to train the model. So usually image augmentation, data augmentation can help with the prediction. And then you can train your data and see the result.
So are you all running up to this stage? Yeah? Is it finished now?
So for those of you who don't know, stride and padding.
So do you all know stride and padding? So stride is the step size for moving the filter and padding is to pad zeros to your
data so that your data have the same dimension as the original data or you can do different headings, but it's just to add zeros at the border. Okay, I used to see a very nice animation of it.
So the model is running or you're just waiting for the results? Do you have any questions? I was wondering why you're using mean absolute error? Maybe. Why we shouldn't use it?
Why not mean square error but mean absolute error? Yeah, I don't see much difference actually between it. I think mean square error is a bit more sensitive to outliers.
Yeah, because they're squares. So I take more absolute values because I'm more interested in the general performance. But I don't see very big difference between these two. I think here I'm using both, right?
So here with the model compile, you can specify what you want to use.
Oh, okay, cool. That's great. That's great. Did you check the file?
No, I did check it out, but I didn't always look and it couldn't be found at all. But then I think I just did a full refresh.
Oh, yes. Yes, if you do run all, you just run everything. So here, I think I went maybe too fast.
Here the compile is to specify what loss you want to use and what optimizer you want to use. This optimizer, there are many choices. So I recommend you to learn more about it at home because you can't cover it. So basically, sometimes having talked about that the local minimum is not a problem.
But the problem is when we have the plateau. So when we have really flight gradient, then it's very hard to descend it. So like item and many optimizer is to try to give it more momentum.
So to let it go fast when you need a plateau. So that you can descent your gradient further. Okay, so this summary is to give you these plots. So you can see your input on what you're doing at each stack.
And this one is to give you this plot. Yeah, is everything clear?
So the question is if it gives rankings for the predictor variables. And it is not, it doesn't.
There are other ways to visualize what separate convolutional layers actually notice in the data. It doesn't really show you what variables are important but it shows you what kind of patterns it recognizes as being relevant.
Yes, so the comment is that even though we can't see the variable importance, but we can look at the output of each of the layers. So we can look at the weights and the prediction at each layer. So we can see what the neural network is doing.
What features have been learned and so on. You have a question? Oh yeah, it is not exactly this momentum, but I think it's evolution from that idea.
So that goes a step further from momentum. By preventing this overshooting problem of the momentum. Because momentum you can also just go too fast and then miss the minimum.
So the item trying to optimize there. I think it has been a while before I carefully read it. So yes, exactly.
Yeah, but this item is nowadays probably the most popular one. So most people are using an item now.
An item? Okay. Thanks for your suggestion. So the suggestion is to use an item. I actually don't know that.
I'm learning a lot. This, yeah, an item.
I'll put a note here.
Okay, I think I finished running now. I don't know you. Have you all finished training?
Still working on it? So it's actually really, it's very important to do cross validation here. To do like tenfold or so.
Because we don't have that many data points. We have like 5,000. So each time you do it, you're going to get some different results now. Because you split data differently. And that affects it. So here is this iteration. What I get.
So I think what you get is going to be different most likely. Because I didn't set seeds there. So you can see this training error is really decreasing rapidly. But this testing error is very lactoseque here. So I think it's going to go more stable if I try more IPOCs.
So now this 50 is actually really low. Because, you know, I don't want to spend that much time. But if you increase it, I think it's going to get steady. Y-axis is the mean of solute error.
And X is iteration.
Here are the prediction values. Of course it's better to put it on a map. It's just showing some fluctuations. And here, just for this KIGO, I can't really run it in a large area. But you can try it on your server or a better cloud.
Because now I have some memory issues here. This is the road map. You know, which area that I want to try to show you the spatial prediction pattern. But with KIGO, I can't do it because I don't have memory.
Of course, if not for demonstration, I should write everything into functions. So that's the best way of avoiding the memory leakage. But if you're interested, then definitely try it on a server. Or maybe your local computer can do, maybe your laptop can do better than KIGO.
Just for more storage. So we are almost one and a half hours.
After that hands-on, we have another one for modeling process.
So that's again in R. In this R script modeling process folder. Where you can see how to do hyperparameter tuning, bootstrap cross validation and mapping. Actually also the pre-processing of OpenStreetMap.
So you can see how we implement everything for our publication. I want to thank you very much for participating in this course. I want to say this is really the best time if you want to learn machine learning.
Because we have so many materials. These are my favorite books. If you are a book reader, some of them are very classical books. Like these two for statistical learning. And this is especially for Gaussian process known as Kriging in geostatistics.
So they are actually also use machine learning in the Gaussian process. And these three deep learning books. So this is really deep learning with Python is really my favorite. And this introduction to deep learning is quite new.
It's in 2020. I really like the introduction part of it. And there are also lots of online courses that you can really learn from Stanford, MIT and Coursera. So if you want to learn deep learning or machine learning then really join their community.
Okay, so thank you very much. And then I think now you can just work on. So after this one you can work on the modeling process. To see some details of our implementation. I make a map.
Yes, and you can ask questions. Do you have a slide that you summarize? You mean a comparison? Yeah. I only have a comparison between these random forests at G-Boost and Boosting.
And they are all in this hands-on three. So you can see how they compare to each other. This random forest, this neural network is quite new. I think there's still lots of space for improvement. So I'm still making it better.
And this neural network I think I'm probably a bit fast. So if you have any specific questions about the parameters there or the different functions, how it works, you can ask.
I hope you all understand the process very well. Thank you. That was nice. To me, no. No?
I don't know what was going on there. You got anything? Even if I run old?
Oh. Did you turn on the GPU? The accelerator? Just try the GPU. Sorry I forgot to say that. Because I have it all the time. Wait a second. Yeah, it's set in the morning.
Is it interactive GPU in the corner? Yeah, here. So if you click on this button. Same thing. The last three dots here.
I think it's mainly because they need your telephone number. They do? For you to use the GPU.
So if you haven't given it to them, probably you're not really using the GPU. For your account? Yeah. Just a bitcoin miner using their GPUs. I didn't create my account using my Google account. I think in your setting, they will ask you to.
But the GPU really lets you run much faster. On TPU, even faster.
Yeah, you should search for it. It makes it even faster.
For TPU, you need to configure it a bit. Oh yeah, there's quite a lot.
40 hours, 30 hours a week. And also, you need to really do this end session. You need to end the session. So here. So here to click it before you close the window. Otherwise, it's going to idle one hour.
So here, you need to stop to shut it down. Yes. So you have 30 hours per week to run the GPU. So if you shut down, you can save some time.
So in this convolutional filter, I have also our markdown that you can try different convolutional filters.
So for example, I just need it. So here, it's how the convolutional filter works.
They just basically still do your... So this is padding. And then this is one step striding. And then you just aggregated the wiring. So I aggregated the wiring from the new layer.
And here you can see how different convolutional filters, which are just metrics that functions on the image. So you track the IGs and some sharpen it. Blurring it and so on.
And if you are interested in the... Oh, and here, these calc predictors is just how we calculate our predictor variables.
So the row density and so on. And do the regreading and stuff. And this deep learning. This CNN had this Jupyter notebook that's on Kaggle. And here, the data we use. Oh yeah, maybe just not at all.
I should also share. But then it's also difficult to download. But here's the data that we use in the Kaggle. And then here's the installation, if you want to do it locally. Yeah, I think that's everything.
Well, in this archive, I think probably you won't need it. So here is also how to process OpenStreetMap.
So just if you're interested in extracting things from OpenStreetMap, I wrote a document about it. So there are different ways you can do it. You can download everything. Or you can also interactively query it.
There's both R and Python packages for it. And also, I think there's... And also, of course, SQL and QGIS OSM package. This OSM-NX, it's a Python package. I think you can have a try.
I think it's really handy. So you can either download the package or run their Docker. So you can just... And it's very nicely documented. So you can extract features. Also, just do lots of analysis on the OpenStreetMap.
Yeah, you can make query. You can also do analysis to calculate the density and the fastest route from one point to another. Yes.
Yeah, that is a Python. It's not only for roads. Yeah.
Yeah, I really like this one.
Oh, and for the...