Chapter 5: Evaluating Hypotheses

Major Issues

Given h constructed from n examples drawn randomly from D, what is the best estimate of h over future instances drawn from D?
What is the probable error in this accuracy estimate?

Sample Error: error_S(h) = (1/n) * Σ _{x ∈ S} δ(f(x), h(x))
where δ(f(x), h(x)) is 1 is f(x) and h(x) predict differently and 0 otherwise.
True Error: error_D(h) = Pr_{x ∈ D} [ f(x) ≠ h(x) ]

Key Question: How good of an estimate is error_S(h) for error_D(h)?

THEN

the most probable value of error_D(h) is error_S(h) which is r/n
with 95% confidence, error_D(h) lies in error_S(h) ∓ 1.96 * sqrt[error_S(h) * (1 - error_S(h)) / n ]

Table 5.1 shows values for various confidence intervals.

Shows basic definitions and facts from statistics.

n: number of training instances
r: number of errors
p: the probability of an error
P(r) = C(n,r) * p^r * (1 - p)^r
E[X] = n * p
Var(X) = n * p * (1 - p)
&sigma_X = sqrt (n * p * (1 - p))
If (n * p * (1 - p)) ≥ 5, the binomial distribution is closely approximated by the normal distribution with the same mean and variance
Table 5.3

error_S(h) = r/n
error_D(h) = p
error_S(h) is an estimator for error_S(h)
The estimation bias is error_S(h) - error_D(h)
If the estimation bias is 0, the estimator is unbiased
For a binomial distribution, the estimator is unbiased!
In general, σ_{error_S(h)} = σ_r / n = sqrt ( p * (1 - p) / n) = sqrt ( error_S(h) * (1 - error_S(h)) / n )
A practical example is worked on page 138

Definition: an N% confidence interval for some parameter p is an interval that is expected with probability N% to contain p
See Figure 5.1 for a picture
Confidence intervals are relatively easy to find if we use the normal distribution as an approximation to the binomial distribution
If a random variable Y obeys a normal distribution, the measured value y of Y will fall into the following interval N% of the time: μ ∓ Z_N * σ where Z_N values are given in Table 5.1
Confidence intervals can have two-sided or one-sided bounds

See Table 5.4
E[X] = μ
Var(X) = σ²
σ_X = σ
The Central Limit Theorem states that the sum a large number of independent, identically distributed random variables follows a distribution that is approximately normal