Back Propogation
Consider the following model:
input
layer layer 1 layer 2 layer l Layer L
v -----> O --------| --> O O ---> o
1 | / 1
O ... /
/ ^
v -----> O -----/ | O O ---> o
2 . | . 2
. | .
. | \ .
v -----> O --------| ---> O O ---> o
N . M
.
where there N inputs, vi, L layers, Nl nodes in each
layer, l, M outputs, oi, and M target output values, ti.
Each node in layer l has a linear
activation function, sil = Ail'Vi,
where Ail is a weight vector, and a transfer
function, yil= fil(si).
Note that the output of a node in a hidden layer is y rather than o.
Now define the following:
- alji is the weight in node j in layer l
for the connection from node i in layer l-1 to node j layer l.
- fl(s)is the same for all nodes in layer l.
- The error at an output node, i, is
E = tLi - oLi.
- The least mean squares error criterion provides that the partial
derivative of Eli(o) = Eli.
- The partial derivative of fl(s) with respect to s
is f'l(s).
- The delta for a node is
dli = E * f'l(s).
- The delta rule specifies that the change in weight vector
alji =
dli * yl-1ji.
For an output node,