{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 8 - Chapter 13"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Due Date: Monday, April 7th no later than 11:59 p.m.\n",
    "- Partner Information: You may complete this assignment individually or with exactly one classmate.\n",
    "- Submission Instructions (working alone): Upload your solution, entitled **YourFirstName-YourLastName-Homework8.ipynb** to the \n",
    "BrightSpace Homework 8 Dropbox.\n",
    "- Submission Instructions (working with one classmate): Upload your solution, entitled \n",
    "**YourFirstName-YourLastName-PartnerFirstName-PartnerLastName-Homework8.ipynb** to the BrightSpace Homework 8 Dropbox. Note: If you \n",
    "work with a partner, only one person needs to submit a solution. If you both submit a solution, the submission that will be graded is the one \n",
    "from the partner whose last name comes alphabetically first.\n",
    "- Deadline Reminder: Once the submission deadline passes, BrightSpace will no longer accept your submission and you will no longer be able to earn credit. \n",
    "Thus, if you are not able to fully complete the assignment, submit whatever you have before the deadline so that partial credit can be earned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Starting Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datascience import *\n",
    "import matplotlib.pyplot as plots\n",
    "import numpy as np\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download the file [education2023.csv]()\n",
    "into the same directory as this Jupyter notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place the csv file in the same directory as your solution\n",
    "education = Table().read_table(\"education2023.csv\")\n",
    "education"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1 - 1 Point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Government work is a key field for data scientists, as accurate data is essential for effective policy-making and informed analysis of national, state, or local conditions. One of the largest collectors of data in America is the U.S Census Bureau, an organization that strives to provide detailed information about the population of the country.\n",
    "\n",
    "You have been selected to analyze the percent of the population with a bachelor's degree across U.S. counties. The Census Bureau expects the median percentage to be 25%. What are the null ($H_0$) and alternative ($H_a$) hypotheses of this study?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Null Hypothesis ($H_0$):**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Alternative Hypothesis ($H_a$):**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2 - 2 Points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Census Bureau initially wants to explore the distribution of county-level bachelor’s degree completion rates. Create a histogram using the **Bachelors Percent** column from the dataset. The histogram should have bins ranging from 0 to 100, with intervals of 5 percentage points. To highlight the spread of the data, plot two red vertical dashed lines at the 25th percentile and 75th percentile.  (Hint: Use the [MatPlotLib](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.axvline.html) documentation as a reference)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3a - 1 Point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Census Bureau prefers not to use the entire dataset when analyzing problems in order to save time and computational resources. Create a table called **education_sample** by sampling the original dataset without replacement 500 times. Then, display this table's **Bachelor Percent** column as a histogram, using bins ranging from 0 to 100 in intervals of 5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3b - 1 Point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What principle from Chapter 13 explains why this distribution is similar to the population distribution?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer** - "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4a - 3 Points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To estimate the median percentage of county populations with a bachelor’s degree, perform a bootstrap simulation using **education_sample**. First, create a function that samples 500 rows (the sample size) from education_sample with replacement, then returns the median of the sampled **Bachelors Percent** values. Write a second function that calls the first function 5,000 times, storing each median in an array and then returning the array. Utilize these two functions to display a histogram of the median values, as well as a 99% confidence interval represented by a yellow line along the bottom of the histogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4b - 1 Point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If a 90% confidence interval was used instead of a 99% confidence interval, would the range covered grow smaller or larger? Explain your answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer** - "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 5 - 1 Point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Would the null hypothesis ($H_0$) from question 1 be accepted or rejected if a 1% level of significance was used? Explain your answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer** - "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}