Chapter 12: Combining Inductive and Analytical Learning
- The goal is to find domain-independent algorithms that employ
explicitly domain-dependent knowledge
Inductive Learning Summary
- Goal: hypothesis fits data
- Justification: statistical inference
- Advantages: requires little prior knowledge
- Pitfalls: scarce data, incorrect bias
Analytical Learning Summary
- Goal: hypothesis fits domain theory
- Justification: deductive inference
- Advantages: learns from scarce data
- Pitfalls: imperfect domain theory
Desirable Properties of a Combined System
- Given no domain theory, it should learn at least as effectively
as purely inductive methods
- Given a perfect domain theory, it should learn at least as effectively
as analytical methods
- Given an imperfect domain theory and imperfect training data, it should
combine the two to outperform either purely inductive or purely
analytical methods
- It should accommodate an unknown level of data in the training data
- It should accommodate an unknown level of error in the domain theory
Learning Framework
- Given D: the training data, possibly containing errors
- Given B: the domain theory, possibly containing errors
- Given H: the hypothesis space
- Determine h: a hypothesis that best fits the training data and domain theory
- How can we determine the best fit? One idea is to find
argminh ∈ H
kD errorD(h) + kB errorB(h)
Hypothesis Space Search Techniques
- Use prior knowledge to derive an initial hypothesis from which to
begin the search - KBANN
- Use prior knowledge to alter the objective of the hypothesis space
search - EBNN
- Use prior knowledge to alter the available search steps - FOCL
KBANN
- Knowledge-Based Artificial Neural Network
- KBANN's bias comes from the domain-specific theory used
to initialize the weights
- Input: D
- Input: B, a domain theory consisting of nonrecursive,
propositional Horn clauses
- First, construct a neural network that classifies D according
to the domain theory (see below)
- Second, employ backpropagation to fit D
- KBANN typically generalizes more accurately than standard
backpropagation. However, B must be fairly accurate and
B is limited to a particular syntactic form.
Constructing the Initial KBANN Network
- For each instance attribute create a network input
- For each Horn clause in B, create a network unit as follows
- Connect the antecedents of the Horn clause to the consequent,
creating new network nodes as needed
- For each non-negated antecedent of the clause, assign a weight
of W to the corresponding sigmoid unit input
- For each negated antecedent of the clause, assign a weight of
-W to the corresponding sigmoid unit input
- Set the threshold weight w0 for this unit to
-(n - .5)W where n is the number of non-negated antecedents
of the clause
- Add additional connections among the network units, connecting
each network unit at depth i from the input layer to all network units
at depth i+1. Assign random near-zero weights to these
additional connections
Notes
- In KBANN, the inputs to the network are 0 or 1
- In KBANN, treat values above 0.5 as true and values below
0.5 as false
- A value of 4.0 for W is typical
KBANN Example
- Table 12.3 shows B and D
- Figure 12.2 shows the initial network
- Figure 12.3 shows the final network
Exercises