{ "cells": [ { "cell_type": "markdown", "id": "09d7b732", "metadata": {}, "source": [ "# Homework 5" ] }, { "cell_type": "markdown", "id": "e89524e6-114e-46a7-95a4-d1b1283ae4bc", "metadata": {}, "source": [ "- Due Date: Tuesday, March 4th -- no later than 11:59 p.m.\n", "- Partner Information: You may complete this assignment individually or with exactly one classmate.\n", "- Submission Instructions (working alone): Upload your solution, entitled **YourFirstName-YourLastName-Assignment5.ipynb** to the \n", "BrightSpace Assignment 5 Dropbox.\n", "- Submission Instructions (working with one classmate): Upload your solution, entitled \n", "**YourFirstName-YourLastName-PartnerFirstName-PartnerLastName-Assignment5.ipynb** to the BrightSpace Assignment 5 Dropbox. Note: If you \n", "work with a partner, only one person needs to submit a solution. If you both submit a solution, the submission that will be graded is the one \n", "from the partner whose last name comes alphabetically first.\n", "- Deadline Reminder: Once the submission deadline passes, BrightSpace will no longer accept your submission and you will no longer be able to earn credit. \n", "Thus, if you are not able to fully complete the assignment, submit whatever you have before the deadline so that partial credit can be earned." ] }, { "cell_type": "markdown", "id": "96e67964", "metadata": {}, "source": [ "# Starting Code" ] }, { "cell_type": "code", "execution_count": null, "id": "eb7a773d-1e75-42e5-9560-350dbe7821f0", "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "id": "e717598b", "metadata": {}, "outputs": [], "source": [ "# Place the csv file in the same directory as your solution\n", "file_path = \"top_spotify_songs_usa.csv\"\n", "spotify_data = Table.read_table(file_path)\n", "spotify_data.show(5)" ] }, { "cell_type": "markdown", "id": "cc764a9b-18a1-4326-980f-c279e799f151", "metadata": {}, "source": [ "## Question 1 (5 points)\n", "\n", "Analyze the distribution of artists who have the most songs in this dataset. Specifically, in descending order, display the top 10 artists \n", "with the highest number of songs and visualize this distribution using a bar chart. Place any part of the solution that can be reused by\n", "Question 2 into functions so that you can avoid duplicating code.\n" ] }, { "cell_type": "markdown", "id": "5c0b2701-2aee-40c9-8013-b4f0d5d6a579", "metadata": {}, "source": [ "## Question 2 (4 points)\n", "\n", "Randomly sample 100, 1000, and 10000 unique songs from the dataset. For each sample size, \n", "in descending order, display the top 10 artists with the highest number of songs using a bar chart.\n", "Reuse relevant functions from Question 1 to avoid duplicating code that you have already written.\n", "\n", "**Starter Code:**" ] }, { "cell_type": "code", "execution_count": null, "id": "c6f481ac-e895-466c-b844-9785ec6e76be", "metadata": {}, "outputs": [], "source": [ "for sample_size in [100, 1000, 10000]:\n", " plot_sampled_distribution(spotify_data, sample_size, 10)" ] }, { "cell_type": "markdown", "id": "82124a9b-e7aa-411e-b484-d21e1b1fb3f4", "metadata": {}, "source": [ "## Question 3 (1 point)\n", "\n", "From the sampled distributions in question 2, identify at least one artist whose ranking significantly changed across the different \n", "sample sizes. Explain why this might have happened.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }