Chapter 10: Learning Sets of Rules
- Decision trees can be used to learn sets of rules
- Genetic algorithms can be used to learn sets of rules
- The focus of this chapter will be learning rules one at a time
- This chapter will show how to learn both propositional rules
and first order logic rules
Sequential Covering Algorithms
Typical Greedy Algorithm - Table 10.1
- Set the learned rules to the empty set
- Apply the learn one rule algorithm to return a new rule
- While the performance of the new rule on the training examples
exceeds some threshold
- Add the new rule to the learned rules
- Remove from the training examples the examples that are
correctly classified by the new rule
- Apply the learn one rule algorithm to return another new rule
- Sort the learned rules according to some performance metric
over the remaining training examples
Typical Learn One Rule Algorithm - Table 10.2 - CN2
- Initialize the best hypothesis to be the most general hypothesis
- Initialize the candidate hypotheses set with the above hypothesis
- While the candidate hypothesis set is not empty
- Generate the next more specific hypotheses
- The set of all constraints is the set of attribute value
pairs that occurs in the training examples
- Make a new set of candidate hypotheses by adding each attribute
value pair to it
- Remove from the candidate hypotheses any duplicate hypotheses,
inconsistent hypotheses, or not maximally specific hypotheses
- Update the best single hypothesis, if a new best has been found
- Update the candidate hypotheses to contain just the k
best new hypotheses
- Return a rule of the form IF best hypothesis THEN prediction
Variations
- Learn the rules using a general to specific beam search
vs. learn the rules using a specific to general beam search
- Learn one rule at a time vs. learn rules using simultaneous
covering such as in ID3
- Produce rules via a generate-then-test strategy vs. produce
rules in an example-driven fashion
- Do not post-prune rules vs. post-prune rules
- Use entropy as the performance measure vs. use some other measure
such as the number of correctly classified examples divided by
the total number of classified examples
Learning First Order Rules
- First order rules are much more expressive than propositional rules
- Often, this is called inductive logic programming (ILP)
- For example, IF Father(x,y) AND Female(y) THEN Daughter(y,x)
Terminology
- Constants - capitalized
- Predicate symbols - capitalized
- Variables - lowercase
- Function symbols - lowercase
- Horn clause: a disjunctive clause (∨) with at most one positive literal,
for example, A ∧ B ∧ C ∧ D ==> E