Radial basis functions
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Part Number | 8 | |
Number of Parts | 10 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/63182 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
4
8
10
00:00
Personal digital assistantNeuroinformatikFunction (mathematics)Radial basis functionComputer networkNormed vector spaceoutputIdentity managementSymbol tableWeight functionVector spaceInformationSpacetimeData structureComponent-based software engineeringTheoremSoftwareVector spaceSet (mathematics)DistanceType theoryRule of inferenceDifferent (Kate Ryan album)Function (mathematics)Functional (mathematics)outputProcess (computing)PropagatorConnected spaceArithmetic meanRadial basis functionSurfaceTheory of relativityResultantIdentity managementData structureConnectivity (graph theory)Normal (geometry)Figurate numberPoint (geometry)Position operatorPerceptronDescriptive statisticsApproximationKeyboard shortcutNetwork topologyOrder (biology)Single-precision floating-point formatCASE <Informatik>Programming paradigmInformationSpacetimeWeight functionAnalogyUniverse (mathematics)Data compressionMoment (mathematics)Computer animation
08:05
Personal digital assistantTheoremComputer networkRadial basis functionProcess (computing)InformationClique-widthFunction (mathematics)SpacetimeNormal (geometry)outputComponent-based software engineeringVector spacePrice indexVariable (mathematics)Normed vector spaceGradientPhysical systemNichtlineares GleichungssystemAdaptive behaviorError messageMereologyPosition operatorDistanceWeightApproximationField (computer science)GradientPhysical systemWave packetPattern languageFunction (mathematics)Sampling (statistics)SoftwareStrategy gameCombinational logicSelectivity (electronic)NeuroinformatikAdaptive behaviorProcess (computing)DeterminantDistribution (mathematics)Clique-widthArithmetic meanFunctional (mathematics)AnalogySigma-algebraSpacetimeoutputParameter (computer programming)Normal (geometry)Figurate numberPhase transitionInformationSingle-precision floating-point formatMultiplication signBackpropagation-AlgorithmusDimensional analysisComputer architectureConnected spaceRule of inferenceSet (mathematics)Superposition principleAxiom of choiceSubject indexingVector spaceEuclidean vectorDifferent (Kate Ryan album)Pythagorean theoremConnectivity (graph theory)Euklidischer RaumNichtlineares GleichungssystemSummierbarkeitRootGradient descentStatistical hypothesis testingPerceptronProgramming paradigmRadial basis functionComputer animation
16:10
Radial basis functionFunction (mathematics)SpacetimeoutputComputer networkPopulation densityError messageExtension (kinesiology)Maxima and minimaData storage deviceFunctional (mathematics)Read-only memoryHausdorff dimensionExtrapolationLabour Party (Malta)WeightExtension (kinesiology)WeightSelf-organizing mapFunction (mathematics)NumberSoftwarePoint (geometry)Set (mathematics)Sampling (statistics)Gene clusterExtrapolationVector spaceLatent heatClique-widthWave packetError messageMaxima and minimaArithmetic meanSigma-algebraOcean currentDistanceoutputMechanism designLimit (category theory)Functional (mathematics)ApproximationModal logicPerceptronPairwise comparisonFigurate numberSimilarity (geometry)Dimensional analysisProgramming paradigmPosition operatorPopulation densityNeuroinformatikExterior algebraData storage deviceBackpropagation-AlgorithmusDivisorResultantSelectivity (electronic)FunktionenalgebraMultiplication signPattern languageSpacetimeMereologyProcedural programmingBitSemiconductor memoryComputer animation
24:14
Personal digital assistant
Transcript: English(auto-generated)
00:00
Hello and welcome! Today we will look at Radial Basis Functions, RBF. RBF networks approximate functions by stretching and compressing a house on bells and then signing them spatially shifted. Description of their function and their learning process.
00:22
Comparison with multilayer perceptrons. According to Podgeo and Gyrosi radial basis function networks, RBF networks are a paradigm of neural networks, which was developed considerably
00:40
later than that of perceptrons. Like perceptrons, the RBF networks are built in layers, but in this case they have exactly three layers, only one single layer of hidden layers. Like perceptrons, the networks have a feed-forward structure and their layers are completely
01:03
linked. Here, the input layer again does not participate in informational processing. The RBF networks are like MLPs universal function approximators. Despite all things
01:21
in common, what is the difference between RBF networks and perceptrons? The difference lies in the informational processing itself and in the computational rule within the neurons outside of the input layer. So, in a moment we will define as so far unknown type of
01:45
neurons. Components and structure of an RBF network. Initially, we want to discuss colloquially and then define some concepts considering RBF networks.
02:03
Output neurons. In an RBF network, the output neurons only contain the identity as activation function and one weighted sum as propagation function. Thus, they do little more than
02:21
adding all input values and returning the sum. Hiding neurons are also called RBF neurons as well as the layer in which they are located in referred to as RBF layer. As propagation function, each hiding neuron calculates a norm that represents the distance
02:46
between the input to the network and the so-called position of the neuron center. This is inserted into a radial activation function which calculates and outputs the activation of the neuron. Definition 1. Input neuron. An input neuron is an identity neuron. It
03:09
exactly forwards the information received. Thus, it represents the identity function. Definition 2. Center of an RBF neuron. The center of an RBF neuron, H, is the point
03:26
in the input space where the RBF neuron is located. In general, the closer the input vector is to the center vector of an RBF neuron, the higher is its activation.
03:43
Definition 3. RBF neuron. The so-called RBF neurons have a propagation function that determines the distance between the center of a neuron and the input vector. This distance represents the network input. Then the network input is sent through a radial basis function
04:07
which returns the activation or the output of the neuron. Definition 4. RBF output neuron. RBF output neurons are the weighted
04:21
sum as propagation function and the identity as activation function. Components and structure of an RBF network. Definition 5. RBF network. An RBF network has exactly three layers in the following order. The input layer
04:45
consisting of input neurons, the hidden layer also called RBF layer consisting of RBF neurons and the output layer consisting of RBF output neurons. Each layer is completely linked with
05:02
the following one. It is a feed-forward topology. The connections between input layer and RBF layer are underweighted. For example, they only transmit the input.
05:21
The connections between RBF layer and output layer are weighted. The original definition of an RBF network only referred to an output neuron, but in analogy to the perceptrons it is apparent that such a definition can be generalized.
05:43
A piece neuron is not used in RBF networks. The set of input neurons shall be represented by the set of hidden neurons by H and the set of output neurons by O. Therefore, the inner neurons are called radial basis neurons because from their
06:04
definition follows directly that all input vectors with the same distance from the center of a neuron also produce the same output value. Figure 2. Information processing of an RBF network. Now, the question is, what can be realized
06:27
by such a network and what is its purpose? Let us go over the RBF network from top to bottom. An RBF network receives the input by means of the underweighted connections.
06:44
Then the input vector is sent through a norm so that the result is a scalar. This scalar, which by the way can only be positive due to the norm, is perceived by a radial basis function, for example by a Gaussian bell, figure 3.
07:05
The output values of the different neurons of the RBF layer or of the different Gaussian bells are added within the third layer. Basically, in relation to the full input space,
07:21
Gaussian bells are added here. Suppose that we have a second, a third and a fourth RBF neuron and therefore four differently located centers. Each of those neurons now measures another distance from the input to its own center.
07:43
De facto provides different values, even if the Gauss bell is the same. Since these values are finally simply accumulated in the output layer, one can easily see that any surface can be shaped by dragging, compressing and removing Gaussian bells and subsequently accumulating them.
08:07
Here, the parameters for the superposition of the Gaussian bells are in the weights of the connections between the RBF layer and the output layer. Furthermore, the network architecture offers the possibility to freely define off-train
08:26
high-train width of the Gaussian bells, due to which the network becomes even more versatile. Informational processing in RBF neurons
08:40
RBF neurons process information by using norms and radius-basis functions. At first, let us take as an example a simple one-for-one RBF network. It is apparent that we will receive a one-dimensional output,
09:01
which can be represented as a function Fig. 4. Additionally, the network includes the centers of the four inner neurons, and therefore it has Gaussian bells, which are finally added within the output neuron.
09:21
The network also possesses four values which influence the width of the Gaussian bells. On the contrary, the height of the Gaussian bells is influenced by the subsequent weights, since the individual output values of the bells are multiplied by those weights.
09:43
You can see a two-dimensional example in the Fig. 5. Informational processing in RBF neurons Since we use a norm to calculate the distance between the input vector and the center of a neuron H, we have different choices.
10:03
Often, the Euclidean norm is chosen to calculate the distance. Remember, the input vector was referred to as X, here the index I runs through the input neurons and thereby through the input vector components and
10:22
the neuron center components. As we can see, the Euclidean distance generates the score differences of all vector components, adds them and extracts the root of the sum. In two-dimensional space, this corresponds to the Pythagorean theorem.
10:43
From the definition of a normal directly follows that the distance can only be positive. Strictly speaking, we hence only use the positive part of the activation function. By the way, activation functions other than the Gaussian bell are possible.
11:02
Normally, functions that are monotonically decreasing over the interval from 0 to infinity are chosen. Now that we know the distance between the input vector and the center of the RBF neuron, this distance has to be passed through the activation function.
11:24
Here we use, as already mentioned, a Gaussian bell. It is obvious that both the center and the width can be seen as part of the activation function, and hence the activation function should not be referred to as FAG simultaneously.
11:45
Combinations of equation system and gradient strategies are useful for training. Analogous to the MLP, we perform a gradient descent to find the suitable weights by means of the already well-known delta rule. Here backpropagation is unnecessary,
12:06
since we only have to train one single weight layer, which requires less computing time. It is very popular to divide the training into two phases by analytically computing a set of weights and then refining it by training with the delta rule.
12:25
There is still the question whether to learn offline or online. Here the answer is similar. To the answer for the multilayer perceptron, initially one often trains online faster movement across the error surface. Then, after having approximated the solution,
12:46
the errors are once again accumulated and for a more precise approximation one trains offline in a short learning phase. However, similar to the MLPs,
13:00
you can be successful by using many methods. As already indicated, in an RBF network not only the weights between the hidden and the output layer can be optimized. It is not always trivial to determinize centers and weights of RBF neurons. It is obvious that the approximation
13:25
occurrence of RBF networks can be increased by adapting the weights and positions of the Gaussian bells in the input space to the problem that needs to be approximated. There are several methods to deal with the centers and the weights of the Gaussian bells.
13:46
Fixed selection. The centers and weights can be selected in a fixed manner and regardless of training samples, this is what we have assumed until now. Conditional fixed selection. Again, centers and weights are selected fixedly,
14:05
but we have previous knowledge about the functions to be approximated and comply with it. Adaptive to the learning process, this is definitely the most elegant variant, but certainly the most challenging one too. Fixed selection.
14:24
In any case, the goal is to cover the input space as evenly as possible. Here, weights of two-thirds of the distance between the centers can be selected so that the Gaussian bells overlap by approx. One-third. Figure 6.
14:44
The closer the bells are, the more time consuming the full thing becomes. This may seem to be very annoying, but in the field of function approximation we cannot avoid even coverage. Here, it is useless if the function to be approximated is precisely
15:08
represented at some positions, but at other positions the return value is only zero. However, the height input dimension requires a great many RBF neurons, which increase
15:22
the computation effort approximately with the dimension and is responsible for the fact that 6-10 dimensional problems in RBF networks are already called high dimensional. An MLP, for example, does not cause any problems here.
15:46
Conditional fixed selection. Suppose that our training samples are not evenly distributed across the input space. It then seems obvious to arrange the centers and sigma of the RBF neurons by means of the pattern distribution.
16:05
So, the training patterns can be analyzed by statistical techniques such as cluster analysis, and so it can be determined whether there are statistical factors according to which we should distribute the centers and sigmas, figure 7.
16:24
A more trivial alternative would be to set centers on positions randomly, selected from the set of patterns. So, this method would allow for every training pattern to be directly in the center of an neuron, figure 8. This is not yet very elegant,
16:43
but a good solution when time is an issue. Generally, for this method, the bits are fixedly selected. If we have reason to believe that the set of training samples is clustered, we can use clustering methods to determinize them.
17:02
There are different methods to determinize clusters in an arbitrarily dimensional set of points. One neural clustering method are the so-called ROLFs, self-organizing maps.
17:21
Growing RBF networks automatically adjust the neuron density. In growing RBF networks, the number of RBF neurons is not constant. A certain number of neurons as well as their centers and widths are previously selected,
17:43
by means of clustering method, and then extended or reduced. In the following text, only simple mechanisms are sketched. Neurons are added to places with large error values. After generating this initial configuration, the vector of the weights is analytically calculated.
18:08
Then all specific errors concerning the set of the training samples are calculated and the maximum specific error is those. The extension of the network is simple,
18:22
we replace this maximum error with a new RBF neuron. Of course, we have to exercise care in doing this. If the sigma are small, the neurons width only influence each other if the distance between them is sharp.
18:41
But if the sigma are large, the already existing neurons are considerably influenced by the new neuron because of the overlapping of the Gaussian bales. So it is obvious that we will adjust the already existing RBF neurons when adding the new neuron.
19:04
To put it simply, this adjustment is made by moving the center of the other neurons away from the new neuron and reducing their width a bit. Then the current output vector of the network is compared to the teaching input and the weight
19:23
vector is improved by means of training. Subsequently, a new neuron can be inserted if necessary. This method is particularly suited for function approximations, limiting the number of neurons. Here, it is mandatory to see that the network will not grow
19:45
at infinitum, which can happen very fast. Thus, it is very useful to previously define a maximum number of neurons. Less imported neurons are deleted, which leads to the question whether it is possible to continue learning
20:03
when this limit is reached. The answer is, this would not stop learning, we only have to look for the most unimportant neuron and delete it. A neuron is, for example, unimportant for the network if there is another neuron that has a similar function.
20:23
If open occurs, the two Gaussian bales exactly overlap and add such a position, for instance, one single neuron with a higher Gaussian bale would be appropriate. These RBF neurons are multilayer perceptrons we have already become a sequent with
20:46
and extensively discussed two network paradigms for similar problems. Therefore, we want to compare these two paradigms and look at their advantages and disadvantages. Comparing RBF networks and multilayer perceptrons
21:04
We will compare multilayer perceptrons and RBF networks with respect to different aspects. Input dimension. We must be careful with RBF networks in high dimensional functional spaces since the network could very quick require huge memory storage and computation
21:25
effort. Here a multilayer perceptron would cause less problems because its number of neurons does not grow exponentially with the input dimension. Center selection. However, selection of centers for RBF networks is despite the introduced approaches still a major problem.
21:49
Please use any previous knowledge you have when applying them. Such problems do not occur with the MLP. Output dimension. The advantage of RBF networks
22:03
is that the training is not much influenced when the output dimension of the network is high. For an MLP a learning procedure such as backpropagation thereby will be very time consuming.
22:20
Extrapolation. Advantage SPS disadvantage of RBF networks is the lack of extrapolation capability. An RBF network returns the result zero far away from the centers of the RBF layer. On the one hand it does not extrapolate, unlikely the MLP it cannot be used for
22:46
extrapolation, whereby we could never know if the extrapolated values of the MLP are reasonable, but experience shows that MLPs are suitable for that measure.
23:00
On the other hand, unlike the MLP, the network is capable to use this zero to tell us I don't know, which could be an advantage. Lesson tolerance. For the output of an MLP it is not so important if a weight or an neuron is missing, it will only worsen a little in
23:30
an RBF network, then large parts of the output remain particularly uninfluenced, but one part of the output is heavily affected because a Gaussian bell is directly missing.
23:47
Rose. We can choose between a strong local error for a lesion and a weak but global error. Spread. Here, the MLP is advantaged, since RBF networks are used considerably less often,
24:04
which is not always understood by professionals, at least as far as low-dimensional input spaces are concerned. Thank you for your attention, see you at the next lectures.