{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c9a4b8a7-d0b3-4098-a9b8-e9464c454bf8",
   "metadata": {},
   "source": [
    "# Chapter 13.4 - Using Confidence Intervals"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9caa463e-e059-4aba-b3c7-fcdea71619cf",
   "metadata": {},
   "source": [
    "## Repeated information from Chapter 13.3, Confidence Intervals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5516087-9b25-409f-94d5-f11dd0c167c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datascience import *\n",
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "324be9b7-0e03-4db1-9928-17de96c0a99a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place the csv file in the same directory as this notebook\n",
    "ski_resorts = Table().read_table(\"ski_resorts.csv\")\n",
    "ski_resorts.show(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3501b588-5066-4da4-bcd7-1e5e9b385082",
   "metadata": {},
   "outputs": [],
   "source": [
    "ski_resorts.hist(\"Total Snowfall\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa763892-3f7d-4f8f-a7c6-b6fae3d3649e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assume the 875 entries in our csv file are pulled from a much larger sample\n",
    "def one_bootstrap_mean():\n",
    "    resample = ski_resorts.sample()\n",
    "    return np.average(resample.column('Total Snowfall'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f86e03ca-2383-43d5-98b0-51baa7bb6f54",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate many means from bootstrap samples\n",
    "def many_bootstrap_means(how_many):\n",
    "    bootstrap_means = make_array()\n",
    "    for _ in np.arange(how_many):\n",
    "        bootstrap_means = np.append(bootstrap_means, one_bootstrap_mean())\n",
    "    return bootstrap_means"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "888a2377-8cbe-4a94-abc6-2847213ab7fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Obtain endpoints of the 95% confidence interval\n",
    "bootstrap_means = many_bootstrap_means(1000)\n",
    "left = percentile(2.5, bootstrap_means)\n",
    "right = percentile(97.5, bootstrap_means)\n",
    "make_array(left, right)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33f0bc3a-0246-4999-90e1-65bca3da8025",
   "metadata": {},
   "source": [
    "The array endpoints show the 95% confidence interval for the mean Total Snowfall.\n",
    "Here is a histogram to help visualize:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "123ffa2a-1708-4d49-9934-506dc56ea556",
   "metadata": {},
   "outputs": [],
   "source": [
    "resampled_means = Table().with_column('Bootstrap Sample Mean', bootstrap_means)\n",
    "resampled_means.hist(bins=20, unit=\"Inches\")\n",
    "plots.plot([left, right], [0, 0], color='yellow', lw=8);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08f8bc23-cb82-4602-a13f-fd059be7592d",
   "metadata": {},
   "source": [
    "## An Incorrect Use of a Confidence Interval"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32ffeda8-da28-47a4-8378-57d160ea40cb",
   "metadata": {},
   "source": [
    "Avoid the common mistake of incorrectly using the confidence interval.\n",
    "For example, it is incorrect to conclude that 95% of the ski resorts have a total snowfall \n",
    "between the interval of [left, right] found above.  Why is this?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2be934c8-8d61-4463-82d0-321225cb99a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "low_bound = left\n",
    "high_bound = right\n",
    "reduced_ski_resorts = ski_resorts.where(\"Total Snowfall\", are.above_or_equal_to(low_bound))\n",
    "reduced_ski_resorts = reduced_ski_resorts.where(\"Total Snowfall\", are.below_or_equal_to(high_bound))\n",
    "print(\"The percentage of ski resorts in this interval = {:.2f}%.\".format(reduced_ski_resorts.num_rows / ski_resorts.num_rows * 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d909ab75-dd69-430c-a92c-ebec64c636c8",
   "metadata": {},
   "source": [
    "## A Correct Use of a Confidence Interval"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1248fadb-822e-48a9-88c7-52d063fc2973",
   "metadata": {},
   "source": [
    "But we can use a confidence interval to test a hypothesis!\n",
    "- **Null Hypothesis** - The average total snowfall in the population is 100\n",
    "- **Alternative Hypothesis** - The average total snowfall in the population is not 100\n",
    "\n",
    "The null hypothesis can be rejected since it is not in the 95% confidence interval."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "135c7e57-22cb-4bba-8e60-9c44005ff510",
   "metadata": {},
   "source": [
    "## Another Correct Use of a Confidence Interval"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "def5efce-5143-4565-92d1-d41ed3c69407",
   "metadata": {},
   "source": [
    "Here is another example.  Let the **Null Hypothesis** be that the Average Summit Depth\n",
    "is no more than 10 inches greater than the Average Base Depth.  (Note: these two numbers are *paired*.) \n",
    "To reject this hypothesis with 99% confidence, we can use the bootstrap method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "849762ed-04ec-43bd-be52-510a395d249d",
   "metadata": {},
   "outputs": [],
   "source": [
    "depth_table = ski_resorts.select(\"Average Base Depth\", \"Average Summit Depth\")\n",
    "depth_table = depth_table.with_column(\"Difference\", \n",
    "    depth_table.column(\"Average Summit Depth\") - depth_table.column(\"Average Base Depth\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3cdecd1a-e345-46e3-8632-cec92ff758d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The average difference is {:.2f} inches.\".format(np.average(depth_table.column(\"Difference\"))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "201187e5-1fc0-400d-8d48-9e6c50c199f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def one_bootstrap_mean():\n",
    "    resample = depth_table.sample()\n",
    "    return np.average(resample.column('Difference'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ab44d34-568d-4c5f-8898-b2b8b3f2d1e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate many bootstrap means\n",
    "def many_bootstrap_means(num_repetitions):\n",
    "    bstrap_means = make_array()\n",
    "    for _ in np.arange(num_repetitions):\n",
    "        bstrap_means = np.append(bstrap_means, one_bootstrap_mean())\n",
    "    return bstrap_means"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9172891-7170-420d-9caf-dbc0c311da91",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the endpoints of the 99% confidence interval\n",
    "bstrap_means = many_bootstrap_means(1000)\n",
    "left = percentile(0.5, bstrap_means)\n",
    "right = percentile(99.5, bstrap_means)\n",
    "make_array(left, right)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c24626f-cb37-493e-971c-0a4d10c1f67c",
   "metadata": {},
   "outputs": [],
   "source": [
    "resampled_means = Table().with_columns(\n",
    "    'Bootstrap Sample Mean', bstrap_means\n",
    ")\n",
    "resampled_means.hist()\n",
    "plots.plot([left, right], [0, 0], color='yellow', lw=8);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "508c31ee-648d-4286-82b8-0a5b6c3c82b9",
   "metadata": {},
   "source": [
    "Notes:\n",
    "- The higher we want our confidence to be, the larger the interval becomes\n",
    "- We have done better than simply concluding that we can reject the null hypothesis. We have estimated how big the average difference is. That’s a more useful result than just saying, “It’s not 10 inches or less.”"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}