We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Understanding and Implementing Recurrent Neural Networks using Python

00:00

Formal Metadata

Title
Understanding and Implementing Recurrent Neural Networks using Python
Title of Series
Number of Parts
132
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Recurrent Neural Networks (RNNs) have become famous over time due to their property of retaining internal memory. These neural nets are widely used in recognizing patterns in sequences of data, like numerical timer series data, images, handwritten text, spoken words, genome sequences, and much more. Since these nets possess memory, there is a certain analogy that we can make to the human brain in order to learn how RNNs work. RNNs can be thought of as a network of neurons with feedback connections, unlike feedforward connections which exist in other types of Artificial Neural Networks. The flow of talk will be as follows: - Self Introduction - Introduction to Deep Learning - Artificial Neural Networks (ANNs) - Diving DEEP into Recurrent Neural Networks (RNNs) - Comparing Feedforward Networks with Feedback Networks - Quick walkthrough: Implementing RNNs using Python (Keras) - Understanding Backpropagation Through Time (BPTT) and Vanishing Gradient Problem - Towards more sophisticated RNNs: Gated Recurrent Units (GRUs)/Long Short-Term Memory (LSTMs) - End of talk - Questions and Answers Session
35
74
Thumbnail
11:59
Recurrence relationArtificial neural networkTwitterEmailAlgebraLinear mapElementary arithmeticFormal languageComputer networkProgrammschleifeAlgorithmInformationContrast (vision)Term (mathematics)Read-only memorySource codeVariety (linguistics)Computer-generated imagerySeries (mathematics)Speech synthesisGradientConnected spacePhysical systemPlastikkarteRead-only memoryEndliche ModelltheorieImplementationData modelLink (knot theory)CodeWeb pageArtificial neural networkMedical imagingMereologyEndliche ModelltheorieoutputState of matterProgrammschleifeLogic gateProcess (computing)Cartesian coordinate systemInformationComputer networkComputer architecturePhysical systemConjugate gradient methodWeightCellular automatonContrast (vision)Set (mathematics)Transformation (genetics)Wave packetElementary arithmeticFormal languageVideoconferencingMultiplication signFraction (mathematics)Range (statistics)WordStapeldateiRead-only memoryFrame problemMatrix (mathematics)Backpropagation-AlgorithmusTerm (mathematics)Natural languageFunction (mathematics)Recurrence relationWell-formed formulaTime seriesElement (mathematics)Scalar fieldMachine learningMultiplicationLinear algebraPropagatorAdditionDivisorDifferent (Kate Ryan album)Functional (mathematics)Numerical analysisChainInsertion lossLoop (music)Point (geometry)Right angleError messageImplementationCodeElectric generatorComplex (psychology)PlastikkarteSoftware developerDecision theoryHyperbolic functionReading (process)Computer animation
ImplementationArtificial neural networkData modelEndliche ModelltheorieWeb pageInformationCodeLink (knot theory)Rekursives neuronales NetzSoftware testingIntegrated development environmentRecurrence relationPreprocessorTask (computing)Library (computing)Endliche ModelltheorieNumerical analysisPredictabilityZoom lensVisualization (computer graphics)MathematicsRecurrence relationLattice (order)ResultantAreaError messageComputer animation
ImplementationGoogolSoftware testingDimensional analysisRule of inferenceNumerical analysisReading (process)Matrix (mathematics)Frame problemParameter (computer programming)Normal (geometry)Open setFunctional (mathematics)Shape (magazine)AreaPoint (geometry)Row (database)Wave packetDifferent (Kate Ryan album)Extreme programmingMaxima and minimaLibrary (computing)Arithmetic meanDot productElectronic mailing listMultiplication signSet (mathematics)Error messagePrice indexPredictabilityPreprocessorScalar fieldState observerSocial classObject (grammar)Range (statistics)PlotterSoftware testingResultantTask (computing)Artificial neural networkArray data structureMetric systemDivisorComputer animation
Multiplication signArtificial neural networkMathematical optimizationShape (magazine)Row (database)Parameter (computer programming)Dimensional analysisSequenceHypercubeProduct (business)Social classRandomizationoutputDefault (computer science)TunisFunction (mathematics)Linear regressionNoise (electronics)Object (grammar)ResultantLibrary (computing)Regular graphNumerical analysisPredictabilityGroup actionDependent and independent variablesArray data structureRange (statistics)Electric generatorType theoryThree-dimensional spaceComputer networkExecution unitInsertion lossSet (mathematics)Arithmetic meanScaling (geometry)MereologyMatrix (mathematics)Conjugate gradient methodWave packetDrop (liquid)Hidden Markov modelCuboidPopulation densitySinc functionAnalytic continuationComputer animation
Wave packetSet (mathematics)ImplementationData modelEndliche ModelltheorieLink (knot theory)InformationWeb pageInsertion lossSet (mathematics)Functional (mathematics)Complete metric spaceoutputLattice (order)PlotterResultantRow (database)Error messageUltraviolet photoelectron spectroscopyTwitterParameter (computer programming)Entropie <Informationstheorie>PredictabilityMultiplication signGoodness of fitSoftware testingBinary codeElectronic mailing listLine (geometry)Wave packetFitness functionStapeldateiTask (computing)Cartesian coordinate systemPreprocessorArtificial neural networkMathematical optimizationElement (mathematics)TunisDependent and independent variablesSocial classLinear regressionBuildingTotal S.A.Inverse elementReading (process)Normal (geometry)Execution unitCompilation albumPhase transitionFrame problemScaling (geometry)Numerical analysisLevel (video gaming)Point (geometry)CuboidState of matterHypercubeSequenceComputer animation
Recurrence relationArtificial neural networkTwitterEmailTerm (mathematics)Read-only memoryGradientImplementationData modelEndliche ModelltheorieCodeLink (knot theory)InformationWeb pageBlogSystem callSource codeSlide ruleMultiplication signChainResultantWeb pageAdditionState of matterAddress spaceTask (computing)Inverse elementWave packetTransformation (genetics)Artificial neural networkPiFunction (mathematics)Functional (mathematics)Bounded variationCASE <Informatik>outputInformationScaling (geometry)Medical imagingVector spaceImplementationRange (statistics)Computer architectureField (computer science)Product (business)Intelligent NetworkPredictabilityPerfect groupComputer animation
Transcript: English(auto-generated)
All right, a very warm welcome and good morning, all of you. My name is Anmol Krishan Sadeva, and the topic for today is understanding and implementing recurrent neural networks. So before starting, let's first see an example. If you are reading a book, and it has, say, five chapters, we are on chapter one.
We have read that. We are going to chapter two. We are starting reading it. But suddenly, we forgot everything that was there in chapter one. So how will we be able to understand chapter two? It means short-term memory plays a very crucial role. On that motion, I would like to start with the recurrent neural networks.
So Martin has already introduced me. I'll be skipping this. Perquisites, you should be aware of Python language. You should have decent knowledge of artificial neural networks and elementary linear algebra. So recurrent neural networks can be thought of
as the neural networks which persist information. And we can think of it as sequential processes basically influencing decisions. So in contrast to the traditional neural networks which don't persist information, recurrent neural networks are having disadvantage over those.
So we can think recurrent neural networks as the networks having loops to themselves. I'll be explaining the architecture now. And these are one of the most complex, supervised deep learning algorithms we have today. So consider a normal neural network and it has many layers.
X0, X1, X2 is basically the input that we are providing to the layer. A is a hidden layer. So the input from X0 goes to A. And the input gets transferred from one hidden layer to another hidden layer. Simultaneously, it gets transferred from the other hidden layer to the next hidden layer. And we can think it of squashing this thing
that you see on your right hand side into the thing that is known as the vanilla RNN. That means it is having loop to itself. RNNs can be of many architectures. The first one is one-to-many architecture in which you provide one input
and you can get many outputs. So you can think of, say, you provided an image and it provides with certain caption. So an image is mapped into caption of say five or six words. So it is one image getting mapped into five or six words.
That is one-to-many transformation of RNN. Likewise, we have many-to-one. So many-to-one can be thought of as, say, we are having video frames and we are generating text out of the video so it can be mapped to many-to-many or many-to-one. So we can get from a video a sentence.
And similarly, we can have many-to-many transformation also. So recurrent neural networks have the following applications. That is image captioning, subtitle generation, time series classification, language modeling, natural language processing. Even chatbot development is also based on RNNs nowadays.
So there's a major problem in the vanilla RNN. That is, if the RNN is having so many layers, so adjusting the weights at each hidden layer is a problem. That means once the information from X0
is transferred to A, it is multiplied by some weight matrix, then transferred to the next layer, then transferred to next, then transferred to next. And simultaneously, the output gets generated on the top. Now we have something called as loss function. That is the difference between the actual output and the predicted output.
So if the loss functions for, say, HT plus one generates some error, so there's some, like, error of, say, 0.15%, and it needs to be back-propagated throughout the network. And back-propagating through a network that is so large is difficult, because each time
you can think of, say, W is getting multiplied at each layer, and multiplying W, that is, between zero and one, multiplying anything by zero and one, between zero and one, say 0.2. If we multiply something by 0.2 multiple times, then it will be tending to very small value.
So it will be like, the gradient will be propagating through the whole chain, but it will be taking much time to train, so it is not feasible solution. So vanilla RNN suffers from the vanishing gradient problem. Likewise, we have the exploding gradient problem. So exploding gradient problem is when
the value of W is greater than one. So if you multiply simultaneously a number by, say, a factor of more than one, then it will tend to always go and multiply to many huge values.
Now, the vanishing gradient problem can be thought of as W is less than one, and exploding gradient can be thought of as W is greater than one. To solve these vanishing gradient problem and the exploding gradient problems, we use certain techniques. So for exploding gradient problem, we have truncated backpropagation.
So in truncated backpropagation, we divide the whole set into certain batches, and in those certain batches, we just backpropagate within those certain batches sequentially. In system of rewards and penalties, it is like reinforcement learning, so we provide rewards for, say,
if the backpropagation is doing well, else we provide penalties. And we have gradient clipping. If the gradient goes beyond some range, then we just clip that gradient and we don't propagate it through the network. For vanishing gradient problem, we have smartweight initialization that is like a guesswork.
Then we have equal state networks and we have LSTM. So I'll be talking today of LSTM that is long short-term memory, and it is one of the most used variants of RNN. So LSTM is one of the most used variant of RNN, and the approach for LSTM is making the weight W equal to one.
So you can think of, you are not having W less than one, you are not having W greater than one, so what else we can do? We can just make W equal to one. Now, coming to the architecture of the LSTM, C is the new introduction in here. Xt is the state of the current input, the hidden layer,
and Ct can be thought of as the cell state. So in here, if we provide the input Xt and the hidden layer and we transfer this thing to four states, F, I, G, O. F is the, you can say, the final gate or forget gate,
I is the input gate, G is another gate, there's no name for it, and O is the output gate. We take, we apply the function like the element-wise multiplication and the addition using this formula. So Ct is basically F times, that is forget gate times the Ct minus one,
that is the previous state of the cell that was there. It is basically forgetting some part of the memory. So it is like you were reading something and you actually forgot some part of it, but you retain certain part of it.
So this is actually forgetting a major part of it and retaining some part of it, and that some part is used for training the rest of the network. And this is actually applied through the TANH function and the output gate is multiplied and we get the output of the hidden layer.
So comparing the RNN and the LSTM, it is, we can say the LSTM has a sophisticated architecture in terms that it has a cell state and using this cell state, it is like a super highway through which the blue is not getting changed.
So we are actually changing the element-wise fraction that is F and we are not changing the weight matrix. So multiplying anything by element or scalar multiplication is much simpler than the matrix multiplication, which is involved in the weight matrix multiplication
that is used in the vanilla RNN. So that is why the vanishing gradient problem goes off in LSTM. Now we can start building the LSTM and the code I'll be providing for that. So you can just join in this. You can just implement this as it is now.
I'll be just shifting to the implementation thing. So the major task of implementing the RNN involves data pre-processing,
data pre-processing then building the recurrent neural model and making the predictions and visualizations. So before implementing, we need certain libraries that is Keras library, then we use scikit-learn,
we use TensorFlow and yeah. So first task is data pre-processing. So we import the NumPy library, we import the Matplot library, we import the Pandas library. The NumPy library is for visualizing the results.
It's for error manipulation. Matplot is for visualizing the results and Pandas is for managing the datasets because the Keras library, yeah, zoom. Okay, can you see now?
Okay, so since the Keras library doesn't support the data frames of Pandas, we need to use the NumPy arrays for Keras library.
Now before starting, let me introduce you to the problem. So today we will be predicting the stock price for the Google for first month of 2017. So these are the records that you can see.
These are the records that you can see. So these are 20 records that is each month has 20 financial days, that means weekdays. So this is the record that we have to predict on and the training dataset is this.
So we'll be predicting the open stock prices of Google for first month of 2017 and we'll be training on the year 2012 to 2016, so five years of data. Now actually the prediction should look like this.
So if I take these two columns and just insert the plot for it. So the result that we should predict should be like this. So today our task is to mimic this behavior
of the stock price in January 2017. Now let's get started. So the training dataset is, I'm calling it Google stock price train.csv. I'm importing it as a dataset, the pandas data frame.
So I'm using pd.read CSV. Then since this data frame contains many columns but we only require the open stock prices, the open stock prices that is column one.
So I use the iloc function for selecting, the first parameter in iloc is the colon that means it will be selecting all the rows and the second parameter is the colon parameter that is one to two. So one points to the open and two basically points to the next column
but since the second value is omitted so it is just considering the first column that is the open column and dot values is for converting it to the NumPy arrays. So we are just converting the pandas data frame to the NumPy array. Now we need to scale this data
and we'll be using the scikit-learn pre-processing library and from that we'll be using the minmax scalar class. So sc is the object of minmax scalar class and the feature range parameter here tells that you need to convert the values of the column
between the range zero to one. So everything will be scaled between zero to one. And after that we just fit and transform. So fit and transform means, fitting basically means normalizing the data. So it actually takes the minimum value from that column
and the maximum value from that column and it puts it, the transform puts it into the normalization function. The normalization function can be the standardized function or the normal normalization function. So in standardized function we take the min value and we subtract it from the actual value
and then we divide it by min value but in normalization function we take the min value, we subtract it from the min value and we divide it by the difference between the max value and the min value. So it is a basic normalization thing that we are doing. So fit and transform does this thing.
After that we need to create the training data, the training list and the predicted ground truth list. So x train is a list, an empty list and y train is an empty list. That is for the training you can say set
and the ground truth set. So we have one to five eight records here in the stock prediction data. So we have one to five eight records. These are one to five eight records and we are taking the time step at 60.
So what is time step? Basically time step is a very important concept here in the recurrent neural networks. It is basically how many observations you will be taking into consideration for training the next value or you can say if you are focusing on
just predicting the ith value, then what set of values you will be taking into consideration before that ith value. So it is basically, I'm starting from the 60th record. It means it will be taking the data of three months, training on that and predicting the 61st value. So it is just an assumption.
You can take 20 days, 18 days, anything but it is like during my testing, I took 60 and it produced great results so I'm using 60 as the time step value. And one to five eight is the number of records. So I append to the empty list
the training scaled values from the I minus 60th value to the current value. So that is basically appending. It is making an array in which one record contains 60 previous values from the current value.
So it is like, you can say n cross 60 metrics that is getting created. Now the Y train is basically the ground truth and the ground truth will be just having the current value. So it is from I to I plus one and zero is basically the column.
That is the first column because we have already cleaned our data and we are just having one column for open stock prices. So zero points that thing. Now X train comma Y train equals NP dot array so it is converting to NumPy array. So we are generating X train NumPy array and Y train NumPy array. Then we need to reshape the X train array.
So reshaping means that to work seamlessly with the, you can say, NumPy arrays, and the RNN stuff, we need to add an extra dimension to this existing NumPy array. So basically this is,
this basically X train dot shape is the number of rows. X train dot shape one is number of columns and one is basically the indicator column. That is the open stock column. So we are just reshaping to build it
a 3D array from a 2D array. Now the first step of data pre-processing has been done and we are on to the next step of building the RNN. Does anyone have any doubt in this? Yeah, yeah. So basically we need to train the data.
We need to say we are not having, we are having values say 700, 800, 1,000, to say 20, 22, five in the stock prices. And it is not ideal, it will take a lot of time and it will not generate good results if the data is not scaled between some values,
say zero or one. So I have provided range zero to one to make it more cleaner. So it will be approximated to the nearest value. So it is just cleaning the data and smoothing the data. So scaling has that use. Yeah, yeah, yeah.
Right, so basically the question that you are asking
is a very good question and we use something called grid search for this. That is basically a part of hyper parameter tuning. So these are basically pointing towards the hyper parameter stuff. So 60 is like a hyper parameter only, which is making or which is amending how the predictions are being made.
So what we can do is after this is done, like this is the first group for you all people, but I tried it with certain parameters. And in the hyper parameter tuning, the grid search library provides you with, you can say you can provide arrays of values,
say I have something called say time step. So I provide time step equal to 30, 60, 300 suppose. And I have another value, say optimizer, and I take three optimizers suppose which are used for this type of regression. This is a regression problem because we are dealing
with the continuous values. So we take say RMS prop optimizer that I'll be telling you more about, and we take Adam optimizer and we say an Adam optimizer. So it will be doing a cross product. So it will be take this parameter 30 time step,
cross it with the first RMS prop and generate result. Then 30 Adam generate result, 30 and Adam generate result. Then it will be doing same thing with 60 RMS prop, 60 Adam, 60 and Adam. So it will be generating nine results. And from those nine results, you can actually get the matrix formed,
which will be showing which one is generating good accuracy and less loss. So using that grid search thing, you can actually judge which hyper parameter to tune to what value. So that is part of grid search. Thanks.
No, I have not because it is like, I don't have that much time for showing, yeah. So now building the RNN stuff. So we are importing four classes, sequential, dense, LSTM and dropout. Sequential class is for basically inputting
the first, you can say set of inputs to the neural network. Dense class is for the last layer of the neural network. That's the output layer. LSTM class is for making the hidden layers of the neural network. And dropout is for something called as dropout regularization, which I'll be telling you more about further.
So we are making a regressor. Since it's not a classification problem, it's we are dealing with a continuous data set. So it's a regression problem. So we make an object of sequential class called regressor, and we add the first layer to it. So it's regressor.add. So regressor.add will add layers to the neural network. So first layer is the LSTM layer,
the hidden layer that we are adding. And units equal to 50 means the number of neurons you can see that we are introducing to the network. So this layer will have 50 neurons. And you can take it any value. So it was, again, the hyperparameter stuff.
So I took 50. Then return sequence is equal to true means that the output of this layer will be getting forwarded to the next layer. And input shape equals x-ray and dot shape one that points to the columns of the data set. And one is for reshaping it to the third dimension. So we are just using the column
and the third dimensional value for it. We don't need to use the first row dimension value for it. So this will be adding the first layer. And adding the first layer may introduce overfitting, say. So to deal with the overfitting problem, that is to, say, avoid the noise that is being created by the overfitting stuff,
we use the dropout regularization stuff. So dropout actually means dropping out certain number of neurons from the layer. So out of 50, I have chosen 0.2. That's 20% to be dropped out. So it means it will be just considering 40 neurons for the next layer. And out of those 50 neurons, 10 neurons at random,
or, say, 20% of those 50 neurons will be dropped out at random. So this actually avoids the overfitting problem. Yeah.
So basically, it's a stacked LSTM. So I'm using four hidden layers in this. And for the four layers, I'm adding four dropouts. So, yeah. Yeah.
No, no, it's like during the epochs. Epochs is basically how many times the full data will be propagated forward and backward to lessen the loss, the loss generated by, you can say, the gradient. So it's actually, for the number of epochs,
the dropout will be done at random. So dropping out means just not considering 20% of those neurons randomly in one epoch. And in the next epoch, it will, again, be choosing random 20% and then be not considering. So it's like a random stuff going on.
I've chosen 100 epochs. So it's fully propagating the data forward and backward. And the training is done 100 times on this data. So it's like, for each epoch, it is generating some random value. It's choosing some random neuron, and it's dropping out 10 random neurons.
So it actually avoids the overfitting problem in that way. And then, similarly, I'm adding three more layers. So this is regressor.add LSTM. Now, you have added the first layer initially. So you don't need to provide the input shape
because LSTM class already knows that what is the input. So from the second layer, you don't need to provide this parameter input shape. It will automatically take from the previous thing. And this is the second layer. Then this is the third layer and the fourth layer. In fourth layer, we don't actually need the output return sequence
because we need to pass it through the dense layer and not another LSTM layer. So we'll be omitting the return sequence equal to true parameter. We can state here return sequence equal to false, but the default value of return sequence is false. So we are not putting here the parameter.
And dense is the final output layer. So basically, the units equal to one means it will be having just one neuron. That's the resultant neuron. So dense is actually corresponding to the result of the neural network. Now, the building of RNN has been done,
but the compilation phase is still left. Compilation actually means compiling with a certain loss function and choosing the right optimizer. So as I told you earlier, RMSprop and Adam are the two optimizers that are used with RNN.
And Adam is the one that usually works apart from, say, RNN also, it works with CNN also. So it is a much wider optimizer that you can use and it gives good results. Again, hyperparameter tuning thing. So I have chosen the optimizer as Adam. And since it is the regression stuff,
so we will be using mean squared error loss function. If it would have been, say, the classification stuff, then we would have used the cross entropy or binary cross entropy loss function. Then we have compiled this and then we are ready to fit it. So fitting the RNN for the training set,
so we pass the fit function, X train, Y train, epochs equal to 100, batch size is 32. So it will be taking batch of 32, it will be dividing the data set into 32, say element batches, and it will be working on those batches. It's again, hyperparameter tuning stuff. So what I mean by hyperparameter tuning
is that tuning the parameters of the classes in such a way that it provides good result or good predictions. Now, the task three is the prediction task. So we have done the RNN building, so we have done the data pre-processing stuff. Now is the task to predict the results.
So actually, the data set for predicting was separate than the data set for testing. So we just take the test data set and we merge that test data set with the actual prediction data set to make it a total data set,
which will contain every value. So here, we are just converting the read CSV. We are creating a data frame of the test data set and then we are again converting it to the NumPy array as we did it with the price, the prediction price stuff.
Then we concatenate the open column of the prediction data set with the test column. So it will be one, two, five, eight plus 20 records. That's one, two, seven, eight records in total because during the time of testing, we need to consider the previous 60 records
and for considering those previous 60 records, we should have a data set incomplete. So I'm just using this complete data set and access zero is for vertical join. Input equals data set merged. So basically, it is taking, it is creating another,
you can say, NumPy array, which is based on the merged data set and it will be taking the values, 60 values from the current state. So counting from one, two, five, eight, record number one, two, five, eight, it will be taking into account the previous 60 values
to predict the results of the record one, two, five, eight. Now we are reshaping it again and since we have already fitted the data earlier, we'll just be transforming it. So we are just taking the input NumPy array and we are transforming the input NumPy array
for making the predictions and we use another list called Xtest and we append to this list the values from the last 20 rows that we need to test and then we form the NumPy array of this
and we just shape it again, reshape it again. Now the prediction can be done using the predict function of the regressor class, the class that we formed. So regressor.predict basically makes prediction on the Xtest list that we just formed, the NumPy array
and inverse transform is the function that is used to inverse the normalization stuff that we did. So now we are just, you can say, we are inversing the scaling that we did. We made the scaling from zero to one. We normalized the results from zero to one.
We are just inversing the transform to actually convert them into the real value. So it will be something like zero point, something is converted into 770, that is say a stock price for one record and then we plot these results. So since the time is short,
I have already run this thing in PyCharm. So it was say 100 epochs and you can see that, I'll just show you, you see that with increasing number of epochs,
the loss function is decreasing. So the value of loss is decreasing. So the loss must have started with say some value, one three or something, 0.0034 and it's now at the end of 100 epochs, it's 0.001315.
So you can see as I'm going up, the loss function is increasing. So the value of loss function is increasing. It means that the training has been done correctly and we have achieved certain level of accuracy so that the loss function has minimized its value from say, it's in 30s or 40s,
say 0.0030 to 0.13. And we can see that, the plot that we have,
it's following the trend of ups and downs of the market, but it's not actually predicting the correct values, though it is following the trend. That's like if the value of the stock price is going up, it is following that trend. So blue is the predicted one and red is the actual one. So using these parameters,
I was able to generate this result, doing hyper parameter tuning with grid search will definitely improvise these results and it will actually match with some error with the actual red line.
So it is like just following the trend with the red thing. So we can see that it is actually following the trend. So we can see on x-axis at 2.5, it is just going down. Then it's following the upward trend and then it's being stable at the end. So it was this prediction that using these hyper parameters,
I was able to make.
So the source code for this, I'll be updating my slides, but it's on GitHub.
It's at this address. I'll be just uploading my slides and it will get uploaded on the session page. And after that, so these are my acknowledgements.
Professor Martin Christian, then my supervisors, Christopher Ola, for creating good blogs. And yeah, if you have any questions, then you can ask.
Thanks. Thank you very much. We have time for a couple of questions. Thank you. I wanted to ask, how did you tune your architecture? Why four layers? Okay, so when we deal with RNNs,
we cannot go beyond say, we generally choose two, three or four layers because it actually doesn't make any sense going beyond four layers as the results will not improve over choosing four layers. So for just showing the implementation stuff,
I used four layers and it was just making some good assumptions when compared with using three layers or two layers. But generally in RNNs, we just don't go beyond four layers. It can be the case that if we are using RNNs with some other neural network, say CNN for say image captioning task,
then it will be another scenario. We'll be getting the results of the vector generated from the CNN and we will be transferring those CNN results into the first hidden layer, that's the X zero thing. And those results will be just making further, you can say vectors, forming vectors,
which will then be propagated throughout the chain. So it will be having more impact on the RNN training and in that case, we can just lower down from four layers to two layers because it is already having lots of information from the CNN stuff. So it is basically based on that thing. So if we consider a basic example,
that's the perfect roommate example that is very famous in field of RNN. If you have a roommate who cooks food for you and he cooks say apple pie, chicken and say anything, X, Y, Z, anything. So on sunny day, he cooks say apple pie
and if it's another sunny day, then he just goes off on the duty and doesn't cook anything. If it's rainy day, he's at home and he cooks next dish. So he cooks chicken for you. So it is like the first input is weather,
that's sunny or rainy. The second input is the dish that he made previously. So these input provided to a state will predict the next output. It's like a vector product plus the addition of some of the other things that goes in complex neural networks. You can just make sense out of it, yeah.
Yeah, yeah, I'll be making it public, yeah. Okay, thank you very much again.
No, basically I applied the inverse transform function. So yeah, yeah, yeah. Yeah, it will be inside this range but the results will get inverted. So the current value, so it will be inverse transform function. So that will be adding something divided
by something that is lesser. So it will be just exploding the value from zero one to scale of say hundreds. Yeah, yeah, yeah, that is why the prediction is not predicting the actual result. It is having some variation with the results
because you are using some smoothened value and using that smoothened value, you are going to predict actual results. So these will have variations between them.