Chapter 4: Artificial Neural Networks
Multilayer Networks
- Can represent highly nonlinear decision surfaces. See Figure 4.5.
- Desire a node that has a nonlinear output, yet is differentiable.
One solution is to use a sigmoid threshold unit where the output,
o = &sigma(net) = 1 / (1 + e-net)
and net = Σ wi * xi
- Note that σ'(y) = σ(y) * (1 - σ(y))
Backpropagation
- Uses gradient descent to minimize the squared error between
the network output values and the target values.
- Although backpropagation can settle into a local minimum, in
practice backpropagation produces very good results.
- Table 4.2 shows the algorithm.
- Notice that weights are updated incrementally.
- Thousands of iterations might be needed in practice!
- What are some reasonable termination criteria?
- Adding momentum. α is the momentum constant.
Δ wji(n) = η * δj *
xji + α * Δ wji(n-1)
Representational Power
- Boolean functions - need two layers (not including input layer)
- Continuous functions - need two layers
- Arbitrary functions - need three layers
Hypothesis Space
- An n dimensional vector of real numbers where n is the
number of weights in the network.
Bias
- Inductive bias: smooth interpolation between data points
- Search bias: ??
- Language bias: ??
Hidden Layer Representation
- Useful features can be automatically discovered!
- Take a look at Figure 4.7 for a very simple example.
Overfitting
- Take a look at Figure 4.9 to understand this problem.
- One solution is to use a validation set.
- 10-fold (or more generally k-fold) cross validation is
a very common validation technique.