{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practicum 2 - Friday March 27th, 2026" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Due: Friday, March 27th no later than 10:50 a.m.\n", "- Submission Instructions: Upload your solution, entitled **YourFirstName-YourLastName-Practicum2.ipynb** to the \n", "Canvas Practicum 2 Dropbox." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Starting Code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only use the following libraries. Do not import anything else." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "import matplotlib.pyplot as plots\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data File" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this practicum, we are going to explore a comic book data set. Download the comic_books.csv file and\n", "place it in the same directory as your Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place the csv file in the same directory as this notebook\n", "comics = Table().read_table(\"comic_books.csv\")\n", "comics.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 - 30 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part A - 10 points. Calculate the probability that a randomly selected comic \n", "book has a *Rating* of 8.5 or higher. Make sure that your calculation is general and\n", "works for any comic book csv file with the same format.\n", "Print the answer in the following format (where d is an integer):\n", "\n", "*Probability = dd.dd%*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part B - 10 points. Calculate and print the probability that a randomly selected\n", "comic book whose *Language* is *Japanese* has a rating of 8.5 or higher in the\n", "following format (where d is an integer):\n", "\n", "*Probability = dd.dd%*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part C - 10 points. Calculate and print the probability that a randomly selected\n", "comic book either has an *Age Rating* of *Mature* or won *Awards*.\n", "Note: Unless the value of *Awards* is *nan*, the comic book won an award.\n", "Print the answer in the following format (where d is an integer):\n", "\n", "*Probability = dd.dd%*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 - 20 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose a comic book collection consists of 1,000 comic books.\n", "Broken down by *Age Rating*, the collection includes\n", "- 80 *All Ages* comic books\n", "- 340 *Mature* comic books\n", "- 50 *Mature 17+* comic books\n", "- 400 *Teen+* comic books\n", "- 130 *Young Adult* comic books\n", "\n", "Calculate the total variation distance between this collection and\n", "the actual distribution of the *Age Ratings* in the data set. Print\n", "the answer in the following format (where d is an integer):\n", "\n", "*TVD = 0.dddd*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 - 35 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code constructs a systematic sample from the original dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "comics_sample = Table().read_table(\"comic_books.csv\").take(np.arange(5, comics.num_rows, 10))\n", "comics_sample.show(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part A - 5 points. Calculate the 50th **percentile** of the *Rating* column in the *comics_sample* table.\n", "Display the answer in the following format (where d is an integer):\n", "\n", "*50th percentile Rating = d.dd*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part B - 5 points. Calculate the **average** of the *Rating* column in the *comics_sample*\n", "table. Display the answer in the following format (where d is an integer):\n", "\n", "*Average Rating = d.dd*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part C - 5 points. For an arbitrary dataset, what is the relationship between the 50th **percentile**\n", "and the **average** of the values in a numeric column? For example, is one always higher than the other, \n", "are they always the same, etc. Explain your answer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Relationship with explanation -**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part D - 10 points. Complete the function below. When called, the function should\n", "construct one bootstrap sample from *some_sample* and\n", "return the **average** *Rating* of that bootstrap sample." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def one_bootstrap_average(some_sample):\n", " # Missing lines of code go here.\n", " return bootstrapped_average" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If one_bootstrap_average is implemented correctly, the bootstrapped average will be displayed\n", "one_bootstrap_average(comics_sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part E - 10 points. Complete the function below. The function should\n", "return an array that contains *how_many* bootstrapped averages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def many_bootstrap_averages(how_many, some_sample):\n", " # Missing lines of code go here.\n", " return bootstrap_averages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If many_bootstrap_averages is implemented correctly, the first 10 bootstrapped averages will be displayed\n", "averages = many_bootstrap_averages(1000, comics_sample)\n", "averages[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 - 15 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretend that the code in the cell below is equivalent to using the\n", "bootstrapping method to generate 1,000 average ages\n", "from a sample of college students." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng(seed=42)\n", "bootstrapped_ages = 18 + 6*rng.random(size=1000)\n", "bootstrapped_ages[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Develop code that generates the picture below as closely as possible.\n", "- The red dot is size 50 and represents the average bootstrapped age\n", "- The yellow line captures the middle 50% of the bootstrapped ages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Question 4 Desired Output](https://www.cs.montana.edu/paxton/classes/spring-2026/intro-ds/practicums/practicum-2/q4.png)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.2" } }, "nbformat": 4, "nbformat_minor": 4 }