{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 7 - Chapter 12" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Due Date: Tuesday, March 25th no later than 11:59 p.m.\n", "- Partner Information: **You must complete this assignment individually**.\n", "- Submission Instructions: Upload your solution, entitled **YourFirstName-YourLastName-Homework7.ipynb** to the \n", "BrightSpace Homework 7 Dropbox.\n", "- Deadline Reminder: Once the submission deadline passes, BrightSpace will no longer accept your submission and you will no longer be able to earn credit. \n", "Thus, if you are not able to fully complete the assignment, submit whatever you have before the deadline so that partial credit can be earned." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Starting Code" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "%matplotlib inline\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the file **mariners.csv** and place it\n", "into the same directory as this Jupyter notebook.\n", "The file contains information about the Seattle Mariners 2024 baseball season." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Game Number Attendance
1 45337
2 30013
3 32149
4 29331
\n", "

... (158 rows omitted)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Place the csv file in the same directory as your solution\n", "season_2024 = Table().read_table(\"mariners.csv\")\n", "season_2024 = season_2024.select(\"Gm\", \"Attendance\").relabel(\"Gm\", \"Game Number\")\n", "season_2024.show(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Seattle Mariners, a Major League Baseball team, want to analyze their attendance numbers (at both home and away games) to improve ticket marketing and increase crowd sizes. They are particularly interested in the difference in average attendance before and after the **All-Star Break**. The All-Star Break took place between games 98 and 99.\n", "\n", "Due to Seattle's cold and rainy spring climate, coupled with the perception of early-season games being \"meaningless\", the organization expects games before the All-Star Break to have lower attendance. Using this information, create Null and Alternative Hypotheses for this study." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Null Hypothesis ($H_0$):**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Alternative Hypothesis ($H_a$):**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To begin your analysis, add a column to the table titled **Before ASB** that contains boolean values representing whether the game occurred before or after the All-Star Break. Entries in the new column should be assigned *True* if the game occurred before the All-Star Break and *False* otherwise. \n", "\n", "Using the modified table, display two overlaid histograms that compare attendance before and after the All-Star Break. One histogram should represent attendance for games before the break, while the other should represent attendance for games after the break. As many as 60,000 fans can attend a Seattle Mariners game. Use a bin size of 2500." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 - 1 Point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate and print the average (mean) attendance for games that took place before and after the All-Star Break. Then, calculate and print the difference between these two averages by subtracting the average attendance after the break from the average attendance before the break and print the result. This is known as the **test statistic**." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perform a permutation test by shuffling the **Before ASB** column while keeping the attendance values fixed. For each permutation, calculate the difference in mean attendance between the two groups (Before ASB - After ASB) and store the result in an array. Repeat this process 10,000 times. Display the calculated array, showing just the first 4 items. Although your values will differ, it might look something like this:\n", "\n", "*array([ -113.29623724, 421.70153061, 1983.32780612, 1047.30771684])*" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4b - 1 Point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the array created in question 4a, create a histogram displaying the results. The histogram should display all values that are greater than or equal to the **test statistic** in a different color. Hint: Take a look at the documentation for histograms." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 5 - 1 Point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate and display the p-value." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is the p-value is statistically significant, highly statistically significant or neither? Explain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer** - " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on the p-value, is the null hypothesis or is the alternative hypothesis favored? Explain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer** - " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 6 - 1 Point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can we conclude that there is *causality* between whether a game is played before or after the All-Star break and the attendance at that game? Explain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer** - " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can we conclude that there is an *association* between whether a game is played before \n", "or after the All-Star break and the attendance at that game? Explain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer** - " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 4 }