Chapter 10: Learning Sets of Rules

FOIL

Learns sets of first-order rules
Rules are in disjunctive form, but function symbols are not allowed
Uses hill-climbing (or a beam of size 1)
Produces rules that predict when an example is a positive instance

Algorithm

Identify the positive examples, call this POS
Identify the negative examples, call this NEG
Set the learned rules to be the empty set
While the POS set is not empty
- Set the NewRule to be a rule with no preconditions
- Set NewRuleNeg to be NEG
- While the NewRuleNeg set is not empty
  - Generate candidate literals to add to NewRule from the possible predicates
  - Calculate BestLiteral using FoilGain(literal, NewRule)
  - Add BestLiteral to precondition of NewRule
  - Update the NewRuleNeg set so that each item in it satisfies the NewRule
- Add NewRule to the learned rules
- Delete from POS any examples covered by NewRule
Return the learned rules

Generating Candidate Specializations

If the current rule is L₁, ... L_n => P(x₁, ... x_k) then FOIL considers the following

Add Q(v₁, ... v_r) where Q is a predicate and the v_i are either new variables or variables already present in the rule. At least one of the v_i must already exist as a variable in the rule.
Add Equal(x_j, x_k) where these are variables already present in the rule.
The negation of either of the above forms.

Example

If the predicates are Father(a,b) and Female(c) and we are trying to learn the concept of Daughter(d,e), we start by assuming that everything implies Daughter(d,e). We then add the following preconditions and their negations:

Equal(d,e)
Female(d)
Female(e)
Father(d,e)
Father(e,d)
Father(d,f)
Father(e,f)
Father(f,d)
Father(f,e)

FoilGain

FoilGain(L,R) ≡ t (log₂(p1 / (p1 + n1)) - log₂(p0 / (p0 + n0))

L is the candidate literal
R is the rule
p0 is the number of positive bindings of rule R
p1 is the number of positive bindings of adding literal L to rule R
t is the number of positive bindings of rule R that are still covered after adding literal L to R

Induction as Inverted Deduction

The task is to discover a hypothesis h, such that
(∀<x_i, f(x_i)> ∈ D) (B ∧ h ∧ x_i) ⊢ f(x_i)
where B is the background knowledge.

There are some practical difficulties:

Noisy data is not easily accommodated
The hypothesis space search is intractable
The complexity of the hypothesis space increases with B

Inverting Resolution

Propositional Resolution

Given A ∨ B
Given ¬ B ∨ C
Conclude A ∨ C
Table 10.5 provides a procedure: C = (C₁ - {L}) ∪ (C₂ - {¬ L}

Propositional Inverse Resolution

Given resolvant A ∨ C
Given initial clause A ∨ B
Deduce other clause is ¬ B ∨ C
Another possibility: ¬ B ∨ C ∨ A
Table 10.6 provides a procedure: C₂ = (C - (C₁ - {L})) ∪ {¬ L}

First Order Resolution

Given White(x) ∨ ¬ Swan(x)
Given Swan(Fred)
Conclude White(Fred)
Table 10.7 provides a procedure: C = (C₁ - {L})θ ∪ (C₂ - {¬ L}θ

First Order Inverse Resolution

Given GrandChild(Bob,Shannon)
Given background information Father(Shannon, Tom)
Deduce Grandchild(Bob,x) ∨ ¬Father(x,Tom)
Equation 10.4 summarizes the procedure

Exercises

10.3
10.4
10.5 (one example is fine)
10.6 (one example is fine)