{ "cells": [ { "cell_type": "markdown", "id": "d58d3fc8-e0cf-43ba-9484-7bf0ebfc0e4e", "metadata": {}, "source": [ "# Chapter 13 - Estimation" ] }, { "cell_type": "markdown", "id": "192c9857-023d-43be-b95e-d272d159d05d", "metadata": {}, "source": [ "A statistic based on a random sample can be a reasonable estimate of an unknown parameter in the population.\n", "However, a data scientist must always ask **How different could this estimate have been, if the sample had come out differently?**" ] }, { "cell_type": "markdown", "id": "52e0582d-2560-41c3-bd39-1f141ccc1ed8", "metadata": {}, "source": [ "## 13.1 - Percentiles" ] }, { "cell_type": "code", "execution_count": null, "id": "d39d0bdb-c8e2-4ada-a4c5-96da68793687", "metadata": {}, "outputs": [], "source": [ "from datascience import *" ] }, { "cell_type": "code", "execution_count": null, "id": "ef7b6204-36c8-4a54-9815-9d7bfad613b3", "metadata": {}, "outputs": [], "source": [ "# MSU football scores from regular season games\n", "msu_football_scores = Table().with_columns(\n", " \"Opponent\", make_array(\"New Mexico\", \"Utah Tech\", \"Maine\", \"Mercyhurst\", \"Idaho State\", \"Northern Colorado\", \"Idaho\",\n", " \"Portland State\", \"Eastern Washington\", \"Sacramento State\", \"UC Davis\", \"Montana\"),\n", " \"Score\", make_array(35, 31, 41, 52, 37, 55, 38, 44, 42, 49, 30, 34)\n", ")\n", "msu_football_scores.show()" ] }, { "cell_type": "markdown", "id": "9d383c01-a3b3-4959-ad45-913ea0122aa7", "metadata": {}, "source": [ "**pth percentile** - the smallest value in the collection that is at least as large as p% of all the values" ] }, { "cell_type": "code", "execution_count": null, "id": "d10be96a-6403-4029-83bc-4e1185ff6d2a", "metadata": {}, "outputs": [], "source": [ "scores = msu_football_scores.column(\"Score\")\n", "percentile(50, scores)" ] }, { "cell_type": "code", "execution_count": null, "id": "e6a8cf88-ff5b-43c2-80ff-c61e5e3e4f43", "metadata": {}, "outputs": [], "source": [ "percentile(42, scores)" ] }, { "cell_type": "markdown", "id": "7d54a4a6-2449-466b-9a7e-ed33089d23ab", "metadata": {}, "source": [ "Equivalent process when finding the pth percentile of a collection with n elements:\n", "- Sort the collection\n", "- Calculate k = (p/100) * n\n", "- If k is not an integer, round it up to the next integer\n", "- the kth item in the collection is the answer" ] }, { "cell_type": "code", "execution_count": null, "id": "a6936a74-25d1-4817-8fa1-f8e3069a7894", "metadata": {}, "outputs": [], "source": [ "import math" ] }, { "cell_type": "code", "execution_count": null, "id": "a722bca1-f1ac-40e7-928f-beb17eb08f3c", "metadata": {}, "outputs": [], "source": [ "scores.sort()\n", "k = (42/100) * len(scores)\n", "k = math.ceil(k)\n", "k" ] }, { "cell_type": "code", "execution_count": null, "id": "c5449fdf-dee9-4c5a-a65b-06a7996a8e83", "metadata": {}, "outputs": [], "source": [ "scores.item(k-1)" ] }, { "cell_type": "markdown", "id": "d87b8a6e-d848-4380-a5a2-6cd2433c6007", "metadata": {}, "source": [ "Terminology\n", "- Median - 50th percentile\n", "- First Quartile - 25th percentile\n", "- Second Quartile - 50th percentile\n", "- Third Quartile - 75th percentile\n", "- Middle 50% - between the First Quartile and Third Quartile" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }