Chapter 3: Decision Tree Learning

Completely expressive hypothesis space
Representation: a decision tree, which is equivalent to a disjunction of conjunctions
Bias: small trees
Example implementations: ID3, C4.5

Decision Trees

Interior Node: an attribute to test
Branch: the value of an attribute
Leaf Node: a classification

Appropriate Problems

When instances are represented by attribute-value pairs
When the target function has discrete classifications
When the training data might contain errors
When the training data or testing data might contain missing information

Learning Algorithm

Take a look at Table 3.1. The big issue is how to select the attribute test.

Entropy

Let S be the set of training instances
Entropy(S) = Σ - p_i log₂ p_i
Define 0 log 0 to be 0.
A higher entropy value shows more impurity.
A entropy value of 0 shows no impurity.
Look at Figure 3.1 to see an Entropy graph.

Information Gain

Let A be an attribute
Information Gain(S,A) = Entropy(S) - Σ (|S_v| / |S|) Entropy(S_v)
The goal is to maximize the information gain over all of the attributes.

Hypothesis Space

A complete set of finite discrete-valued functions
Maintains only one tree at any given time
There is no backtracking
All training examples are used to make decisions (it is not incremental like a version space)