Chapter 12: Combining Inductive and Analytical Learning

The goal is to find domain-independent algorithms that employ explicitly domain-dependent knowledge

Inductive Learning Summary

Given no domain theory, it should learn at least as effectively as purely inductive methods
Given a perfect domain theory, it should learn at least as effectively as analytical methods
Given an imperfect domain theory and imperfect training data, it should combine the two to outperform either purely inductive or purely analytical methods
It should accommodate an unknown level of data in the training data
It should accommodate an unknown level of error in the domain theory

Given D: the training data, possibly containing errors
Given B: the domain theory, possibly containing errors
Given H: the hypothesis space
Determine h: a hypothesis that best fits the training data and domain theory
How can we determine the best fit? One idea is to find argmin_{h ∈ H} k_D error_D(h) + k_B error_B(h)

Use prior knowledge to derive an initial hypothesis from which to begin the search - KBANN
Use prior knowledge to alter the objective of the hypothesis space search - EBNN
Use prior knowledge to alter the available search steps - FOCL

Knowledge-Based Artificial Neural Network
KBANN's bias comes from the domain-specific theory used to initialize the weights
Input: D
Input: B, a domain theory consisting of nonrecursive, propositional Horn clauses
First, construct a neural network that classifies D according to the domain theory (see below)
Second, employ backpropagation to fit D
KBANN typically generalizes more accurately than standard backpropagation. However, B must be fairly accurate and B is limited to a particular syntactic form.

For each instance attribute create a network input
For each Horn clause in B, create a network unit as follows
- Connect the antecedents of the Horn clause to the consequent, creating new network nodes as needed
- For each non-negated antecedent of the clause, assign a weight of W to the corresponding sigmoid unit input
- For each negated antecedent of the clause, assign a weight of -W to the corresponding sigmoid unit input
- Set the threshold weight w₀ for this unit to -(n - .5)W where n is the number of non-negated antecedents of the clause
Add additional connections among the network units, connecting each network unit at depth i from the input layer to all network units at depth i+1. Assign random near-zero weights to these additional connections