Chapter 5: Evaluating Hypotheses

General Approach For Deriving Confidence Intervals

Identify the underlying population parameter p to be estimated, for example, error_D(h).
Define the estimator Y (e.g. error_S(h)). It is desirable to choose a minimum variance, unbiased estimator.
Determine the probability distribution D_Y that governs the estimator Y, including its mean and variance.
Determine the N% confidence interval by finding thresholds L and U such that N% of the mass in the probability distribution D_Y falls between L and U.

Central Limit Theorem

Consider a set of independent, identically distributed random variables Y₁ ... Y_n governed by an arbitrary probability distribution with mean μ and finite variance σ². Define the sample mean Y_n = (1/n) * Σ Y_i. As n approaches infinity, the distribution governing (Y_n - μ) / (σ / sqrt(n)) approaches a normal distribution with zero mean and standard deviation 1.

Difference in Error of Two Hypotheses

Consider hypothesis h₁ tested on sample S₁ consisting of n₁ examples.
Consider hypothesis h₂ tested on sample S₂ consisting of n₂ examples.
d = error_D(h₁) - error_D(h₂)
estimator đ = error_S1(h₁) - error_S2(h₂)
đ will have a normal distribution, therefore the Central Limit Theorem applies and ...
σ_đ² = [error_S1(h₁) * (1 - error_S1(h₁))] / n₁ + [error_S2(h₂) * (1 - error_S2(h₂))] / n₂
đ ≈ z_N * sqrt(σ_đ²)
If S = S₁ = S₂, the method still works. This is called a paired t test. Typically, a tighter confidence interval results.

Hypothesis Testing

Given n₁ = 100 and error_S1 = .3
Given n₂ = 100 and error_S2 = .2
Then đ = .1
What is the Pr(error_D(h₁) > error_D(h₂))?
Calculate σ_đ ≈ .061
Therefore z_N ≈ (0.1 / .061) ≈ 1.64
From table 5.1 we have a 90% two-sided confidence interval and a 95% one-sided confidence interval

Comparing Learning Algorithms

L_A: learning algorithm A
L_B: learning algorithm B
Want to calculate E[error_D(L_A(S)) - error_D(L_B(S))]
S₀: training set
T₀: test set
We can estimate the error for the scenario above using error_T0(L_A(S₀)) - error_T0(L_B(S₀))
Table 5.5 shows how to extend this concept to a technique called k-fold cross validation.
Equation 5.17 shows how to calculate the N% confidence interval for k-fold cross validation. It relies on secondary equation 5.18 and table 5.6 to obtain certain constant values.

Chapter 5: Evaluating Hypotheses

General Approach For Deriving Confidence Intervals

Central Limit Theorem

Difference in Error of Two Hypotheses

Hypothesis Testing

Comparing Learning Algorithms

Practice Exercises