Chapter 2: Concept Learning and the General-to-Specific Ordering
Definition
- Concept Learning: Inferring a boolean valued function from training
examples of its input and output.
Notation
- X: set of instances
- x: one instance
- c: target concept, c:X → {0, 1}
- < x, c(x) >, training instance, can be a positive example
or a negative example
- D: set of training instances
- H: set of possible hypotheses
- h: one hypothesis, h: X → { 0, 1 }, the goal is to find h
such that h(x) = c(x) for all x in X
Inductive Learning Hypothesis
Any hypothesis found to approximate the target function well over
a sufficiently large set of training examples will also approximate
the target function well over other unobserved examples.
Definition
Let hj and hk be boolean-valued functions defined
over X. hj is more general than or equal to
hk (written hj ≥g hk)
if and only if
(∀ x ∈ X) [ (hk(x) = 1)
→ (hj(x) = 1)]
This is a partial order since it is reflexive, antisymmetric and
transitive.
Find-S Algorithm
Outputs a description of the most specific hypothesis consistent
with the training examples.
- Initialize h to the most specific hypothesis in H
- For each positive training instance x
- For each attribute constraint ai in h
- If the constraint ai is NOT satisfied by x,
then replace ai in h by the next more general
constraint that is satisfied by x.
- Output hypothesis h
For this particular algorithm, there is a bias that the target
concept can be represented by a conjunction of attribute constraints.
Candidate Elimination Algorithm
Outputs a description of the set of all hypotheses consistent with
the training examples.
Definition
A hypothesis h is consistent with a set of training examples
D if and only if h(x) = c(x) for each example < x, c(x) > in D.
Consistent(h, D) ≡ (∀ < x, c(x) > ∈ D)
h(x) = c(x)
Definition
The version space denoted VSH,D with respect to
hypothesis space H and training examples D, is the subset of hypotheses
from H consistent with the training examples in D.
VSH,D ≡ { h ∈ H | Consistent(h, D) }
Definition
The general boundary G, with respect to hypothesis space
H and training data D, is the set of maximally general members
of H consistent with D.
Definition
The specific boundary S, with respect to hypothesis space
H and training data D, is the set of maximally specific members
of H consistent with D.
Version Space Representation
Let X be an arbitrary set of instances and let H be a set of
boolean-valued hypotheses defined over X. Let c:X → {0,1}
be an arbitrary target concept defined over X, and let D be an
arbitrary set of training examples {<x, c(x)>}. For all
X, H, c and D such that S and G are well defined,
VSH,D = {h ∈ H | (∃s ∈ S)
(∃g ∈ G) (g ≥g h ≥g s)}
Algorithm
- Initialize G to the set of maximally general hypotheses in H
- Initialize S to the set of maximally specific hypotheses in H
- For each positive training example d,
- Remove from G any hypothesis inconsistent with d
- For each hypothesis s in S that is not consistent with d
- Remove s from S
- Add to S all minimal generalizations h of s such that h
is consistent with d, and some member of G is more general than h
- Remove from S any hypothesis that is more general than another
hypothesis in S
- For each negative training example d,
- Remove from S any hypothesis inconsistent with d
- For each hypothesis g in G that is not consistent with d
- Remove g from G
- Add to G all minimal specializations h of g such that h is
consistent with d, and some member of S is more specific than h
- Remove from G any hypothesis that is less general than another
hypothesis in G
Candidate Elimination Algorithm Issues
- Will it converge to the correct hypothesis? Yes, if (1) the
training examples are error free and (2) the correct hypothesis
can be represented by a conjunction of attributes.
- If the learner can request a specific training example, which
one should it select?
- How can a partially learned concept be used?
Inductive Bias
- Definition: Consider a concept learning algorithm L for the
set of instances X. Let c be an arbitrary concept defined over X
and let Dc = {<x, c(x)>} be an arbitrary
set of training examples of c. Let L(xi, Dc)
denote the classification assigned to the instance xi by L
after training on the data Dc. The inductive bias
of L is any minimal set of assertions B such that for any target concept
c and corresponding training examples Dc
(∀ xi ∈ X)
[ L(xi, Dc) follows deductively from
(B ∧ Dc ∧ xi) ]
- Thus, one advantage of an inductive bias is that it gives the
learner a rational basis for classifying unseen instances.
- What is another advantage of bias?
- What is one disadvantage of bias?
- What is the inductive bias of the candidate elimination algorithm?
Answer: the target concept c is a conjunction of attributes.
- What is meant by a weak bias versus a strong bias?
Sample Exercise
Work exercise 2.4 on page 48.