Chapter 3: Decision Tree Learning
- Completely expressive hypothesis space
- Representation: a decision tree, which is equivalent to
a disjunction of conjunctions
- Bias: small trees
- Example implementations:
ID3,
C4.5
Decision Trees
- Interior Node: an attribute to test
- Branch: the value of an attribute
- Leaf Node: a classification
Appropriate Problems
- When instances are represented by attribute-value pairs
- When the target function has discrete classifications
- When the training data might contain errors
- When the training data or testing data might contain
missing information
Learning Algorithm
Take a look at Table 3.1. The big issue is how to select
the attribute test.
Entropy
- Let S be the set of training instances
- Entropy(S) = Σ - pi log2 pi
- Define 0 log 0 to be 0.
- A higher entropy value shows more impurity.
- A entropy value of 0 shows no impurity.
- Look at Figure 3.1 to see an Entropy graph.
Information Gain
- Let A be an attribute
- Information Gain(S,A) = Entropy(S) -
Σ (|Sv| / |S|) Entropy(Sv)
- The goal is to maximize the information gain over
all of the attributes.
Hypothesis Space
- A complete set of finite discrete-valued functions
- Maintains only one tree at any given time
- There is no backtracking
- All training examples are used to make decisions (it is
not incremental like a version space)