Chapter 8: Instance Based Learning
- Training examples are stored, thus the information contained in
these examples is never lost
- Typically, this is lazy (as opposed to eager) learning
because processing is delayed until a new instance must be classified
- A lazy learner constructs a different approximation to the target
function for each distinct query instead of forming an explicit hypothesis
Terminology
- Regression: approximating a real-valued target function
- Residual: the error in approximating the target function
- Kernel Function: the function of distance that is used
to determine the weight of each training example,
wi = K(d(xi, xq))
K Nearest Neighbor
- Uses Euclidean distance as the distance between two points
- The nature of the hypothesis space for k=1 is a Voronoi diagram, see
Figure 8.1
- The inductive bias is to provide a classification that is most similar
to the classification of nearby instances
- A drawback is that it suffers from the curse of dimensionality
Discrete Valued Target Function
- Classification: argmaxv ∈ V
Σ δ(v, f(xi)) where x1 ...
xk are the k nearest neighbors
- Distance weighted variation: argmaxv ∈ V
Σ wi δ(v, f(xi))
where wi = (d(xq, xi))-2
and xq is the query instance
Continuous Valued Target Function
- Classification: (1/k) Σ f(xi)
- Distance weighted variation: Σ wi f(xi)
/ Σ wi
Locally Weighted Regression
- A generalization of k nearest neighbor
- Local: the function is approximated based on data near the query point
- Weighted: the contribution of each training example is weighted
by its distance from the query point
- The estimate f(x) = w0 + w1a1 +
... + wnan
Global Variant
- Error = (1/2) Σx ∈ D
(f(x) - f(x))2
- Δwj = η Σ (f(x) - f(x))aj(x)
One Possible Local Variant
- Error(xq) = (1/2) Σx ∈ k neighbors
(f(x) - f(x))2 K(d(xq, x))
- Δwj = η Σ (f(x) - f(x))
aj(x) K(d(xq, x))
Radial Basis Functions
- This is the only example in today's material of an eager learner
- Provides a global approximation to the target function,
represented by a linear combination of many local kernel functions
- Each of the k kernel functions is a local approximation
to the target function
- The value for a given kernel function is non-negligible only when
input x falls into the region defined by the kernel's center and width
- f(x) = w0 + Σk wμ
Kμ(d(xμ, x)) where xμ
∈ X
- Typically Kμ is the Gaussian function,
e-(1/2σμ2)
d2(xμ, x)
where xμ is the center and
σμ2 is the
variance
- Take a look at Figure 8.2
- An advantage of RBFs is that the input layer and the output layer
can be set up separately. First, k can be chosen and then each
xμ and σμ2 can be
given values. Second, the wi can be trained by
minimizing E = (1/2) Σ (f(x) - f(x))2.
- RBFs have been successfully used to interpret visual scenes
- RBFs can approximate any function with arbitrarily small error,
given a large enough k and provided that each σ2
can be specified separately
Case Based Reasoning
- Doesn't represent instances as real-valued points in an n
dimensional space, it typically uses symbolic descriptions instead
- Multiple retrieved cases may be combined to form the solution,
this typically is knowledge intensive and might involve search
- It is difficult to know how to identify similar cases
- Figure 8.3 shows one entry in the CADET system's case library
and a problem to solve
- One application of CBR is to perform legal reasoning
Exercises