Chapter 2: Concept Learning and the General-to-Specific Ordering

Definition

Concept Learning: Inferring a boolean valued function from training examples of its input and output.

Notation

X: set of instances
x: one instance
c: target concept, c:X → {0, 1}
< x, c(x) >, training instance, can be a positive example or a negative example
D: set of training instances
H: set of possible hypotheses
h: one hypothesis, h: X → { 0, 1 }, the goal is to find h such that h(x) = c(x) for all x in X

Inductive Learning Hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Definition

Let h_j and h_k be boolean-valued functions defined over X. h_j is more general than or equal to h_k (written h_j ≥_g h_k) if and only if
(∀ x ∈ X) [ (h_k(x) = 1) → (h_j(x) = 1)]

This is a partial order since it is reflexive, antisymmetric and transitive.

Find-S Algorithm

Outputs a description of the most specific hypothesis consistent with the training examples.

Initialize h to the most specific hypothesis in H
For each positive training instance x
- For each attribute constraint a_i in h
  - If the constraint a_i is NOT satisfied by x, then replace a_i in h by the next more general constraint that is satisfied by x.
Output hypothesis h

For this particular algorithm, there is a bias that the target concept can be represented by a conjunction of attribute constraints.

Candidate Elimination Algorithm

Outputs a description of the set of all hypotheses consistent with the training examples.

Definition

A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example < x, c(x) > in D.
Consistent(h, D) ≡ (∀ < x, c(x) > ∈ D) h(x) = c(x)

Definition

The version space denoted VS_H,D with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.
VS_H,D ≡ { h ∈ H | Consistent(h, D) }

Definition

The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D.

Definition

The specific boundary S, with respect to hypothesis space H and training data D, is the set of maximally specific members of H consistent with D.

Version Space Representation

Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c:X → {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c and D such that S and G are well defined,
VS_H,D = {h ∈ H | (∃s ∈ S) (∃g ∈ G) (g ≥_g h ≥_g s)}

Algorithm

Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each positive training example d,
- Remove from G any hypothesis inconsistent with d
- For each hypothesis s in S that is not consistent with d
  - Remove s from S
  - Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h
  - Remove from S any hypothesis that is more general than another hypothesis in S
For each negative training example d,
- Remove from S any hypothesis inconsistent with d
- For each hypothesis g in G that is not consistent with d
  - Remove g from G
  - Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h
  - Remove from G any hypothesis that is less general than another hypothesis in G

Candidate Elimination Algorithm Issues

Will it converge to the correct hypothesis? Yes, if (1) the training examples are error free and (2) the correct hypothesis can be represented by a conjunction of attributes.
If the learner can request a specific training example, which one should it select?
How can a partially learned concept be used?

Inductive Bias

Definition: Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary concept defined over X and let D_c = {<x, c(x)>} be an arbitrary set of training examples of c. Let L(x_i, D_c) denote the classification assigned to the instance x_i by L after training on the data D_c. The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D_c
(∀ x_i ∈ X) [ L(x_i, D_c) follows deductively from (B ∧ D_c ∧ x_i) ]
Thus, one advantage of an inductive bias is that it gives the learner a rational basis for classifying unseen instances.
What is another advantage of bias?
What is one disadvantage of bias?
What is the inductive bias of the candidate elimination algorithm? Answer: the target concept c is a conjunction of attributes.
What is meant by a weak bias versus a strong bias?

Sample Exercise

Work exercise 2.4 on page 48.