Chapter 5: Evaluating Hypotheses
General Approach For Deriving Confidence Intervals
- Identify the underlying population parameter p to be
estimated, for example, errorD(h).
- Define the estimator Y (e.g. errorS(h)).
It is desirable to choose a minimum variance, unbiased estimator.
- Determine the probability distribution DY that
governs the estimator Y, including its mean and variance.
- Determine the N% confidence interval by finding thresholds L and U
such that N% of the mass in the probability distribution
DY falls between L and U.
Central Limit Theorem
Consider a set of independent, identically distributed random
variables Y1 ... Yn governed by an arbitrary
probability distribution with mean μ and finite variance
σ2. Define the sample mean Yn =
(1/n) * Σ Yi. As n approaches infinity, the distribution
governing (Yn - μ) / (σ / sqrt(n)) approaches
a normal distribution with zero mean and standard deviation 1.
Difference in Error of Two Hypotheses
- Consider hypothesis h1 tested on sample S1
consisting of n1 examples.
- Consider hypothesis h2 tested on sample S2
consisting of n2 examples.
- d = errorD(h1) - errorD(h2)
- estimator đ = errorS1(h1)
- errorS2(h2)
- đ will have a normal distribution, therefore the Central
Limit Theorem applies and ...
- σđ2 =
[errorS1(h1) * (1 - errorS1(h1))] / n1 +
[errorS2(h2) * (1 - errorS2(h2))] / n2
- đ ≈ zN * sqrt(σđ2)
- If S = S1 = S2, the method still works. This
is called a paired t test. Typically, a tighter confidence interval
results.
Hypothesis Testing
- Given n1 = 100 and errorS1 = .3
- Given n2 = 100 and errorS2 = .2
- Then đ = .1
- What is the Pr(errorD(h1) >
errorD(h2))?
- Calculate σđ ≈ .061
- Therefore zN ≈ (0.1 / .061) ≈ 1.64
- From table 5.1 we have a 90% two-sided confidence interval and
a 95% one-sided confidence interval
Comparing Learning Algorithms
- LA: learning algorithm A
- LB: learning algorithm B
- Want to calculate E[errorD(LA(S)) -
errorD(LB(S))]
- S0: training set
- T0: test set
- We can estimate the error for the scenario above using
errorT0(LA(S0)) -
errorT0(LB(S0))
- Table 5.5 shows how to extend this concept to a technique
called k-fold cross validation.
- Equation 5.17 shows how to calculate the N% confidence interval
for k-fold cross validation.
It relies on secondary equation 5.18 and table 5.6 to obtain
certain constant values.
Practice Exercises