{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practicum 2 - March 28, 2025" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Due Date: Friday, March 28th no later than 10:50 a.m.\n", "- Submission Instructions: Upload your solution, entitled **YourFirstName-YourLastName-Practicum2.ipynb** to the \n", "BrightSpace Practicum 2 Dropbox.\n", "- Note: For all questions, determine the answer using python constructs (as opposed to eyeballing the csv file)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Starting Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data File" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "St. Patrick's Day, a day of revelry and celebration, took place earlier this month.\n", "For this practicum, you are going to explore a New York City file that contains\n", "information about noise complaints that came from a club, bar or restaurant.\n", "Download **noise_complaints.csv** and place it in the same directory as your Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place the csv file in the same directory as this notebook\n", "noise = Table().read_table(\"noise_complaints.csv\")\n", "noise.show(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For this practicum, the only two columns of interest are the borough where\n", "# the bar is located and the number of complaints that were received from that bar.\n", "noise = noise.select(\"Borough\", \"num_calls\").relabel(\"num_calls\", \"Complaints\")\n", "noise.show(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 - 10 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the total number of calls that were made to complain about noise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 - 20 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display a table that shows the probability (as a percentage) that a noise complaint call comes from each\n", "of the five individual boroughs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 - 10 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the table that you created for Question 2, calculate and display the probability\n", "that a noise complaint calls from either Manhattan or Brooklyn.\n", "Your output format should look something like this\n", "(although the exact probability might be different!):\n", "*The probability that a call is made from Manhattan or Brooklyn is 56.78%*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 - 20 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ignore the information above. Assume that the true probability distribution of 1000 noise\n", "complaint calls coming from Bronx/Brooklyn/Manhattan/Queens/Staten Island is [.10, .30, .40, .15, .05].\n", "Furthermore, assume that the probability distribution of 1000 noise complaint calls coming from five\n", "unknown places is [.11, .3, .38, .13, .08]. Use python to calculate and print the total variation\n", "distance between these two distributions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "actual_distribution = make_array(.1, .3, .4, .15, .05)\n", "sample_distribution = make_array(.11, .3, .38, .13, .08)\n", "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 5 - 30 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the style of Chapter 11, obtain 5,000 total variation distances by sampling the\n", "true distribution from Question 4 1,000 times to produce each individual total variation distance.\n", "Plot the distribution of the total variation distances in a histogram that uses the default bins.\n", "Place a red dot on the histogram at (tvd-you-calculated-for-question-4, 0) using plots.scatter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 6 - 10 points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate and display the p-value that the distribution of unknown city calls from Question 4\n", "came from the Bronx/Brooklyn/Manhattan/Queens/Staten Island distribution of Question 4. The format\n", "of your output should be similar to this: *The p-value is 8.43%*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 4 }