{ "cells": [ { "cell_type": "markdown", "id": "9f0dc122-4b53-4268-84e2-24adaba2502c", "metadata": {}, "source": [ "# Chapter 12 - Comparing Two Samples" ] }, { "cell_type": "markdown", "id": "46cb2f0d-d22e-489f-8785-387dec40ac45", "metadata": {}, "source": [ "**A/B Testing** - Deciding whether two numerical samples come from the same underlying distribution" ] }, { "cell_type": "code", "execution_count": null, "id": "e015d95b-51d0-4b16-a048-e241b3e47acf", "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "%matplotlib inline\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "0e81ff54-34e7-47c3-8c82-c5538df9a2b8", "metadata": {}, "outputs": [], "source": [ "salaries = Table.read_table('nba_salaries.csv')\n", "salaries = salaries.relabeled(\"'15-'16 SALARY\", 'SALARY')\n", "salaries = salaries.select('TEAM', 'SALARY')\n", "salaries" ] }, { "cell_type": "markdown", "id": "45d5c797-3cb4-4039-8469-3f48e2ea9fd7", "metadata": {}, "source": [ "Question: Is playing for the Golden State Warriors associated with a higher salary?" ] }, { "cell_type": "code", "execution_count": null, "id": "26549525-7d81-4aba-9169-878c355e4c96", "metadata": {}, "outputs": [], "source": [ "salaries = salaries.with_column('GSW MEMBER?', salaries.column('TEAM') == \"Golden State Warriors\")\n", "salaries" ] }, { "cell_type": "code", "execution_count": null, "id": "078dbac9-420a-4d21-b760-2ffc864b1a0c", "metadata": {}, "outputs": [], "source": [ "salaries = salaries.drop('TEAM')\n", "salaries" ] }, { "cell_type": "code", "execution_count": null, "id": "de974c8d-13d7-4661-bc13-eb1c4f16727a", "metadata": {}, "outputs": [], "source": [ "salaries.group('GSW MEMBER?')" ] }, { "cell_type": "code", "execution_count": null, "id": "050c7a75-e8ad-4e83-acc3-c77ef79939d5", "metadata": {}, "outputs": [], "source": [ "salaries.hist('SALARY', group = 'GSW MEMBER?')" ] }, { "cell_type": "markdown", "id": "f67b58b9-f73d-45e1-8aa7-07b3b9d6fbda", "metadata": {}, "source": [ "- **Null hypothesis**: In the population, the distribution of GSW salaries is the same for players on other teams. \n", "The difference in the sample is due to chance.\n", "- **Alternative hypothesis**: In the population, GSW salaries are higher, on average, than players on other teams." ] }, { "cell_type": "code", "execution_count": null, "id": "06504939-f695-481c-b630-78b82bb86385", "metadata": { "jupyter": { "source_hidden": true } }, "outputs": [], "source": [ "means_table = salaries.group('GSW MEMBER?', np.average)\n", "means_table" ] }, { "cell_type": "code", "execution_count": null, "id": "e3e2ad80-2c1e-4cc1-831c-c8487548c057", "metadata": {}, "outputs": [], "source": [ "# How much more does a Golden State Warrior earn on average?\n", "observed_difference = means_table.column(\"SALARY average\").item(1) - means_table.column(\"SALARY average\").item(0)\n", "observed_difference" ] }, { "cell_type": "markdown", "id": "3825b512-a21d-49b2-a9ba-bb57421083cd", "metadata": {}, "source": [ "The statistic of interest is the average salary. To test the statistic under the null hypothesis,\n", "we can randomly permute the GSW MEMBER? labels." ] }, { "cell_type": "code", "execution_count": null, "id": "e1172975-2cc2-425a-ac57-48749538d5e3", "metadata": {}, "outputs": [], "source": [ "salaries" ] }, { "cell_type": "code", "execution_count": null, "id": "6d011bee-eaad-4f2b-93f8-9b89fbf1cd52", "metadata": {}, "outputs": [], "source": [ "shuffled_labels = salaries.sample(with_replacement = False).column('GSW MEMBER?')\n", "original_and_shuffled = salaries.with_column('Shuffled Label', shuffled_labels)\n", "original_and_shuffled" ] }, { "cell_type": "code", "execution_count": null, "id": "91e07236-cf29-4de9-8a27-180af4081e28", "metadata": {}, "outputs": [], "source": [ "shuffled_only = original_and_shuffled.select('SALARY','Shuffled Label')\n", "shuffled_group_means = shuffled_only.group('Shuffled Label', np.average)\n", "shuffled_group_means" ] }, { "cell_type": "code", "execution_count": null, "id": "fc073d4c-1fa9-4d48-9a05-09201533cce2", "metadata": {}, "outputs": [], "source": [ "def one_simulated_difference_of_means():\n", " shuffled_labels = salaries.sample(with_replacement = False).column('GSW MEMBER?')\n", " original_and_shuffled = salaries.with_column('Shuffled Label', shuffled_labels)\n", " shuffled_only = original_and_shuffled.select('SALARY','Shuffled Label')\n", " shuffled_group_means = shuffled_only.group('Shuffled Label', np.average)\n", " return shuffled_group_means.column(\"SALARY average\").item(1) - shuffled_group_means.column(\"SALARY average\").item(0)" ] }, { "cell_type": "code", "execution_count": null, "id": "d27ca811-f147-4c5a-b02a-342a2bbc3ac1", "metadata": {}, "outputs": [], "source": [ "one_simulated_difference_of_means()" ] }, { "cell_type": "code", "execution_count": null, "id": "9f5b3e43-e91a-45b6-9ad0-6f36e48dbd00", "metadata": {}, "outputs": [], "source": [ "def many_simulated_difference_of_means(how_many):\n", " differences = make_array()\n", " for _ in np.arange(how_many):\n", " new_difference = one_simulated_difference_of_means()\n", " differences = np.append(differences, new_difference)\n", " return differences" ] }, { "cell_type": "code", "execution_count": null, "id": "0e0cd01c-7172-400f-bb12-ddabf14c831e", "metadata": {}, "outputs": [], "source": [ "x = many_simulated_difference_of_means(10)\n", "x" ] }, { "cell_type": "code", "execution_count": null, "id": "8aef6057-fe42-4ee6-9768-e326dc7b9372", "metadata": {}, "outputs": [], "source": [ "repetitions = 250\n", "differences = many_simulated_difference_of_means(repetitions)\n", "Table().with_column('Difference Between Group Means', differences).hist(bins = np.arange(-5.5, 5.6))\n", "print('Observed Difference:', observed_difference)" ] }, { "cell_type": "code", "execution_count": null, "id": "4edad11b-11e8-4386-863b-8535a14dc71e", "metadata": {}, "outputs": [], "source": [ "empirical_p = np.count_nonzero(differences >= observed_difference) / repetitions\n", "empirical_p" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }