{ "cells": [ { "cell_type": "markdown", "id": "2e8efb04-a94b-4e03-9278-5cf2cdc0cdd5", "metadata": {}, "source": [ "# Chapter 13.3 - Confidence Intervals" ] }, { "cell_type": "code", "execution_count": null, "id": "d5516087-9b25-409f-94d5-f11dd0c167c0", "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plots" ] }, { "cell_type": "code", "execution_count": null, "id": "324be9b7-0e03-4db1-9928-17de96c0a99a", "metadata": {}, "outputs": [], "source": [ "# Place the csv file in the same directory as this notebook\n", "ski_resorts = Table().read_table(\"ski_resorts.csv\")\n", "ski_resorts.show(5)" ] }, { "cell_type": "markdown", "id": "ffd94506-0937-44b2-9e16-20a9e70fe5d1", "metadata": {}, "source": [ "Let's use the **bootstrap percentile method** from section 13.2 on the data in ski_resorts.csv \n", "to estimate the *95% confidence interval* of the mean **Total Snowfall** of all ski resorts in North America, \n", "not just the ones that are in the original data set. In this scenario, we don't know what the true average is!" ] }, { "cell_type": "code", "execution_count": null, "id": "3501b588-5066-4da4-bcd7-1e5e9b385082", "metadata": {}, "outputs": [], "source": [ "ski_resorts.hist(\"Total Snowfall\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b5ab39e5-3457-4944-a8db-ae60ba6dc37e", "metadata": {}, "outputs": [], "source": [ "np.average(ski_resorts.column(\"Total Snowfall\"))" ] }, { "cell_type": "code", "execution_count": null, "id": "aa763892-3f7d-4f8f-a7c6-b6fae3d3649e", "metadata": {}, "outputs": [], "source": [ "def one_bootstrap_mean():\n", " resample = ski_resorts.sample()\n", " return np.average(resample.column('Total Snowfall'))" ] }, { "cell_type": "code", "execution_count": null, "id": "f86e03ca-2383-43d5-98b0-51baa7bb6f54", "metadata": {}, "outputs": [], "source": [ "# Generate means from 5000 bootstrap samples\n", "num_repetitions = 5000\n", "bootstrap_means = make_array()\n", "for _ in np.arange(num_repetitions):\n", " bootstrap_means = np.append(bootstrap_means, one_bootstrap_mean())" ] }, { "cell_type": "code", "execution_count": null, "id": "888a2377-8cbe-4a94-abc6-2847213ab7fe", "metadata": {}, "outputs": [], "source": [ "# Obtain endpoints of the 95% confidence interval\n", "left = percentile(2.5, bootstrap_means)\n", "right = percentile(97.5, bootstrap_means)\n", "make_array(left, right)" ] }, { "cell_type": "markdown", "id": "33f0bc3a-0246-4999-90e1-65bca3da8025", "metadata": {}, "source": [ "The array endpoints show the 95% confidence interval for the mean Total Snowfall.\n", "Here is a histogram to help visualize:" ] }, { "cell_type": "code", "execution_count": null, "id": "123ffa2a-1708-4d49-9934-506dc56ea556", "metadata": {}, "outputs": [], "source": [ "resampled_means = Table().with_column('Bootstrap Sample Mean', bootstrap_means)\n", "resampled_means.hist(bins=20, unit=\"Inches\")\n", "plots.plot([left, right], [0, 0], color='yellow', lw=8);" ] }, { "cell_type": "markdown", "id": "3b3bc48d-14d3-458c-91ce-68ff1a42b6ef", "metadata": {}, "source": [ "Note: the empirical histogram of the resampled means has roughly a symmetric bell shape, even though the histogram of the \n", "sampled Total Snowfalls did not. This can be explained by the **Central Limit Theorem**, a theorem we will visit later\n", "in the semester." ] }, { "cell_type": "markdown", "id": "5b1c6377-9057-42f2-a136-ef0435f8fd14", "metadata": {}, "source": [ "**Active Learning**: Eliminate all entries in the table that have an **Average Base Depth** of 0.\n", "Then display a histogram that highlights the 90% confidence interval of the\n", "percentage of the remaining North American resorts that\n", "have an **Average Base Depth** of at least 12 inches." ] }, { "cell_type": "markdown", "id": "651fc90d-999e-4348-a347-80a360c9113b", "metadata": {}, "source": [ "*Step One* - Eliminate the ski resorts that have an Average Base Depth of 0." ] }, { "cell_type": "code", "execution_count": null, "id": "5abf80f1-9965-48e4-8f39-e911d875d13b", "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "id": "d4b38279-d265-4a76-99e9-96a8fd7f9f3b", "metadata": {}, "source": [ "*Step Two* - Define a function that returns True if the value of the parameter exceeds 12." ] }, { "cell_type": "code", "execution_count": null, "id": "c88dcb98-0dc9-4502-817a-d12c992bb27a", "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "id": "5454c0f3-49b6-4e22-8fe1-1fe180d85326", "metadata": {}, "source": [ "*Step Three* - Apply the function to the table, adding a column that indicates whether each entry meets the criteria." ] }, { "cell_type": "code", "execution_count": null, "id": "ba43ba9d-b765-4f56-af4e-64612659d128", "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "id": "d32a9d96-bf2f-4c36-b4bb-58930c61a1f2", "metadata": {}, "source": [ "*Step Four* - Proceed with a process similar to above." ] }, { "cell_type": "code", "execution_count": null, "id": "1e471bf3-0681-4a68-ad23-9686d586cb64", "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }