{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practicum 2 - Friday March 27th, 2026"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Due: Friday, March 27th no later than 10:50 a.m.\n",
    "- Submission Instructions: Upload your solution, entitled **YourFirstName-YourLastName-Practicum2.ipynb** to the \n",
    "Canvas Practicum 2 Dropbox."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Starting Code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Only use the following libraries.  Do not import anything else."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datascience import *\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plots\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this practicum, we are going to explore a comic book data set.  Download the comic_books.csv file and\n",
    "place it in the same directory as your Jupyter notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place the csv file in the same directory as this notebook\n",
    "comics = Table().read_table(\"comic_books.csv\")\n",
    "comics.show(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1 - 30 points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part A - 10 points.  Calculate the probability that a randomly selected comic \n",
    "book has a *Rating* of 8.5 or higher.  Make sure that your calculation is general and\n",
    "works for any comic book csv file with the same format.\n",
    "Print the answer in the following format (where d is an integer):\n",
    "\n",
    "*Probability = dd.dd%*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part B - 10 points.  Calculate and print the probability that a randomly selected\n",
    "comic book whose *Language* is *Japanese* has a rating of 8.5 or higher in the\n",
    "following format (where d is an integer):\n",
    "\n",
    "*Probability = dd.dd%*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part C - 10 points.  Calculate and print the probability that a randomly selected\n",
    "comic book either has an *Age Rating* of *Mature* or won *Awards*.\n",
    "Note: Unless the value of *Awards* is *nan*, the comic book won an award.\n",
    "Print the answer in the following format (where d is an integer):\n",
    "\n",
    "*Probability = dd.dd%*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2 - 20 points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose a comic book collection consists of 1,000 comic books.\n",
    "Broken down by *Age Rating*, the collection includes\n",
    "- 80 *All Ages* comic books\n",
    "- 340 *Mature* comic books\n",
    "- 50 *Mature 17+* comic books\n",
    "- 400 *Teen+* comic books\n",
    "- 130 *Young Adult* comic books\n",
    "\n",
    "Calculate the total variation distance between this collection and\n",
    "the actual distribution of the *Age Ratings* in the data set.  Print\n",
    "the answer in the following format (where d is an integer):\n",
    "\n",
    "*TVD = 0.dddd*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3 - 35 points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code constructs a systematic sample from the original dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "comics_sample = Table().read_table(\"comic_books.csv\").take(np.arange(5, comics.num_rows, 10))\n",
    "comics_sample.show(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part A - 5 points.  Calculate the 50th **percentile** of the *Rating* column in the *comics_sample* table.\n",
    "Display the answer in the following format (where d is an integer):\n",
    "\n",
    "*50th percentile Rating = d.dd*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part B - 5 points.  Calculate the **average** of the *Rating* column in the *comics_sample*\n",
    "table.  Display the answer in the following format (where d is an integer):\n",
    "\n",
    "*Average Rating = d.dd*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part C - 5 points.  For an arbitrary dataset, what is the relationship between the 50th **percentile**\n",
    "and the **average** of the values in a numeric column?  For example, is one always higher than the other, \n",
    "are they always the same, etc.  Explain your answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Relationship with explanation -**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part D - 10 points.  Complete the function below.  When called, the function should\n",
    "construct one bootstrap sample from *some_sample* and\n",
    "return the **average** *Rating* of that bootstrap sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def one_bootstrap_average(some_sample):\n",
    "    # Missing lines of code go here.\n",
    "    return bootstrapped_average"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If one_bootstrap_average is implemented correctly, the bootstrapped average will be displayed\n",
    "one_bootstrap_average(comics_sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Part E - 10 points.  Complete the function below.  The function should\n",
    "return an array that contains *how_many* bootstrapped averages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def many_bootstrap_averages(how_many, some_sample):\n",
    "    # Missing lines of code go here.\n",
    "    return bootstrap_averages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If many_bootstrap_averages is implemented correctly, the first 10 bootstrapped averages will be displayed\n",
    "averages = many_bootstrap_averages(1000, comics_sample)\n",
    "averages[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4 - 15 points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretend that the code in the cell below is equivalent to using the\n",
    "bootstrapping method to generate 1,000 average ages\n",
    "from a sample of college students."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rng = np.random.default_rng(seed=42)\n",
    "bootstrapped_ages = 18 + 6*rng.random(size=1000)\n",
    "bootstrapped_ages[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Develop code that generates the picture below as closely as possible.\n",
    "- The red dot is size 50 and represents the average bootstrapped age\n",
    "- The yellow line captures the middle 50% of the bootstrapped ages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Place answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Question 4 Desired Output](https://www.cs.montana.edu/paxton/classes/spring-2026/intro-ds/practicums/practicum-2/q4.png)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}