{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d58d3fc8-e0cf-43ba-9484-7bf0ebfc0e4e",
   "metadata": {},
   "source": [
    "# Chapter 13 - Estimation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "192c9857-023d-43be-b95e-d272d159d05d",
   "metadata": {},
   "source": [
    "A statistic based on a random sample can be a reasonable estimate of an unknown parameter in the population.\n",
    "However, a data scientist must always ask **How different could this estimate have been, if the sample had come out differently?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52e0582d-2560-41c3-bd39-1f141ccc1ed8",
   "metadata": {},
   "source": [
    "## 13.1 - Percentiles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d39d0bdb-c8e2-4ada-a4c5-96da68793687",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datascience import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef7b6204-36c8-4a54-9815-9d7bfad613b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# MSU football scores from regular season games\n",
    "msu_football_scores = Table().with_columns(\n",
    "    \"Opponent\", make_array(\"New Mexico\", \"Utah Tech\", \"Maine\", \"Mercyhurst\", \"Idaho State\", \"Northern Colorado\", \"Idaho\",\n",
    "                          \"Portland State\", \"Eastern Washington\", \"Sacramento State\", \"UC Davis\", \"Montana\"),\n",
    "    \"Score\", make_array(35, 31, 41, 52, 37, 55, 38, 44, 42, 49, 30, 34)\n",
    ")\n",
    "msu_football_scores.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d383c01-a3b3-4959-ad45-913ea0122aa7",
   "metadata": {},
   "source": [
    "**pth percentile** - the smallest value in the collection that is at least as large as p% of all the values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d10be96a-6403-4029-83bc-4e1185ff6d2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "scores = msu_football_scores.column(\"Score\")\n",
    "percentile(50, scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6a8cf88-ff5b-43c2-80ff-c61e5e3e4f43",
   "metadata": {},
   "outputs": [],
   "source": [
    "percentile(42, scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d54a4a6-2449-466b-9a7e-ed33089d23ab",
   "metadata": {},
   "source": [
    "Equivalent process when finding the pth percentile of a collection with n elements:\n",
    "- Sort the collection\n",
    "- Calculate k = (p/100) * n\n",
    "- If k is not an integer, round it up to the next integer\n",
    "- the kth item in the collection is the answer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6936a74-25d1-4817-8fa1-f8e3069a7894",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a722bca1-f1ac-40e7-928f-beb17eb08f3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "scores.sort()\n",
    "k = (42/100) * len(scores)\n",
    "k = math.ceil(k)\n",
    "k"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5449fdf-dee9-4c5a-a65b-06a7996a8e83",
   "metadata": {},
   "outputs": [],
   "source": [
    "scores.item(k-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d87b8a6e-d848-4380-a5a2-6cd2433c6007",
   "metadata": {},
   "source": [
    "Terminology\n",
    "- Median - 50th percentile\n",
    "- First Quartile - 25th percentile\n",
    "- Second Quartile - 50th percentile\n",
    "- Third Quartile - 75th percentile\n",
    "- Middle 50% - between the First Quartile and Third Quartile"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}