{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 9 - Chapter 14" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Due Date: Monday, April 14th no later than 11:59 p.m.\n", "- Partner Information: You may complete this assignment individually or with exactly one classmate.\n", "- Submission Instructions (working alone): Upload your solution, entitled **YourFirstName-YourLastName-Homework9.ipynb** to the \n", "BrightSpace Homework 9 Dropbox.\n", "- Submission Instructions (working with one classmate): Upload your solution, entitled \n", "**YourFirstName-YourLastName-PartnerFirstName-PartnerLastName-Homework9.ipynb** to the BrightSpace Homework 9 Dropbox. Note: If you \n", "work with a partner, only one person needs to submit a solution. If you both submit a solution, the submission that will be graded is the one \n", "from the partner whose last name comes alphabetically first.\n", "- Deadline Reminder: Once the submission deadline passes, BrightSpace will no longer accept your submission and you will no longer be able to earn credit. \n", "Thus, if you are not able to fully complete the assignment, submit whatever you have before the deadline so that partial credit can be earned." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Starting Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import matplotlib.pyplot as plots\n", "import numpy as np\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the file [cybersecurity.csv]()\n", "into the same directory as this Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place the csv file in the same directory as your solution\n", "cyber_attacks = Table().read_table(\"cybersecurity.csv\")\n", "cyber_attacks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1a - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data science is playing an increasingly important role in cybersecurity, especially in predicting and preventing attacks before they happen. In today’s digital world, cyber attacks are becoming more frequent, more sophisticated, and more costly. To keep up, cybersecurity professionals are increasingly relying on data science tools and techniques to detect, predict, and prevent attacks before they can do damage.\n", "\n", "A very important step in preventing attacks is seeing how attacks have been done before and making sure they cannot be repeated. In our study, we are only interested in attacks that are based in the United States and are from an unknown source. However, the dataframe *cyber_attacks* must be cleaned before sorting can begin. Create a new table called **us_attacks** that only contains attacks that happened in the United States. Display the number of entries in us_attacks. **NOTE** - The dataframe has many different values that represent the United States! Any row that contains any of the following values in the 'Country' column should be used in the new table: **USA, US, United States, America, United States of America.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1b - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The *Attack Source* column of the **us_attacks** table contains valuable information, but its inconsistent formatting—such as variable capitalization—makes it difficult to categorize or analyze effectively. To make it easier to work with, replace the strings in the *Attack Source* column with those strings converted entirely to lowercase letters. Then, create a new table called **unknown_attacks** that only includes values with an 'unknown' *Attack Source*. Display the first 5 rows of the **unknown_attacks** table." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the *unknown_attacks* table, plot a histogram of the **Financial Loss (in Million $)** column, with bins ranging from 0 to 100 in increments of 5. Then, plot the mean and median as vertical dashed lines." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The *standard deviation* of a dataset is an important measure of variability, also known as spread. Calculate the proportion of rows in the *unknown_attacks* data set that falls within 1 standard deviation of the mean, and print the result. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 - 2 Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a new table by sampling with replacement the *unknown_attacks* table 1,000 times. Compare the mean of this table's **Financial Loss (in Million $)** column and the same column in *unknown_attacks*, and explain why they are similar." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Place answer here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Explanation -**" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 4 }