# Chapter 13.3 - Confidence Intervals

In [None]:
from datascience import *
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plots

In [None]:
# Place the csv file in the same directory as this notebook
ski_resorts = Table().read_table("ski_resorts.csv")
ski_resorts.show(5)

Let's use the **bootstrap percentile method** from section 13.2 on the data in ski_resorts.csv 
to estimate the *95% confidence interval* of the mean **Total Snowfall** of all ski resorts in North America, 
not just the ones that are in the original data set.  In this scenario, we don't know what the true average is!

In [None]:
ski_resorts.hist("Total Snowfall")

In [None]:
np.average(ski_resorts.column("Total Snowfall"))

In [None]:
def one_bootstrap_mean():
    resample = ski_resorts.sample()
    return np.average(resample.column('Total Snowfall'))

In [None]:
# Generate means from 5000 bootstrap samples
num_repetitions = 5000
bootstrap_means = make_array()
for _ in np.arange(num_repetitions):
    bootstrap_means = np.append(bootstrap_means, one_bootstrap_mean())

In [None]:
# Obtain endpoints of the 95% confidence interval
left = percentile(2.5, bootstrap_means)
right = percentile(97.5, bootstrap_means)
make_array(left, right)

The array endpoints show the 95% confidence interval for the mean Total Snowfall.
Here is a histogram to help visualize:

In [None]:
resampled_means = Table().with_column('Bootstrap Sample Mean', bootstrap_means)
resampled_means.hist(bins=20, unit="Inches")
plots.plot([left, right], [0, 0], color='yellow', lw=8);

Note: the empirical histogram of the resampled means has roughly a symmetric bell shape, even though the histogram of the 
sampled Total Snowfalls did not.  This can be explained by the **Central Limit Theorem**, a theorem we will visit later
in the semester.

**Active Learning**: Eliminate all entries in the table that have an **Average Base Depth** of 0.
Then display a histogram that highlights the 90% confidence interval of the
percentage of the remaining North American resorts that
have an **Average Base Depth** of at least 12 inches.

*Step One* - Eliminate the ski resorts that have an Average Base Depth of 0.

In [None]:
# Place answer here.

*Step Two* - Define a function that returns True if the value of the parameter exceeds 12.

In [None]:
# Place answer here.

*Step Three* - Apply the function to the table, adding a column that indicates whether each entry meets the criteria.

In [None]:
# Place answer here.

*Step Four* - Proceed with a process similar to above.

In [None]:
# Place answer here.