Samuel Congdon, Thomas Sailer
Python Assignment 4
8 June 2017
CHICAGO CRIME - matplotlib
Purpose
The purpose of this lab is to give students experience reading in data from a .csv file, then using this data to create a variety of plots using matplotlib.
Provided Starting Materials
You will need the following files to complete this assignment. chicago_data.csv contains the information you will use for building the charts. chicago_map.png contains a map of Chicago you should use to overlay your heatmap onto. The original dataset can be found here. Note that the original dataset is over 1.4 GB, while the provided one has already been parsed down to relevant information and a reasonable number of entries.
Assignment
The crime rate in Chicago has been steadily increasing in the last few years. In order to combat this the Chief of Police wants to convince the city to hire additional police officers. You have been tasked with creating a series of charts intended to convince the mayor's office of this need. You have been given the homicide records of the city for the past 16 years. Using this data, the Chief has requested several graphs be created for him. Your assignment is to complete these charts.
Totals Graph
The first chart you've been tasked to create is a bar chart, displaying the total number of homicides recorded in the city of Chicago since 2001. You will first need to read in the data from chicago_data.csv, then count the number of murders that happened each year. Hint: numpy has several built in functions that can do this. Once you're finished, you're chart should look something like this.
Differences Graph
The next request is a chart showing how the total number of homicides has changed each year. This should be a line graph (with dots), displaying the difference in total number of homicides from the previous. This chart should appear similar to this.
Locations Chart
In order to convince the mayor that additional police officers would be able to prevent murders, the chief needs to show that there are murders taking place in areas that police can intervene. To do this, he wants you too create a pie chart depicting the six most popular locations for homicide. This data is in the 'Location Description' column of chicago_data.csv. This chart should look like this.
Districts Chart
Once the mayor has agreed to increase the size of the police force, the Chief still must decide which districts will receive the extra men and women, and provide his reasoning for it. The districts with the highest murder rate and lowest arrest rates are the districts with the most need for additional officers. The best way to demonstrate this data will be with a scatter-area plot. The x-axis should determine which district is being represented while the y-axis plots the arrest-rate for these districts. The arrest data is contained in the 'Arrest' column of the .csv file. TRUE means an arrest was made, FALSE means no arrest was made. The arrest rate is the percentage of homicides that resulted in an arrest. Lastly, the size of each point should represent the total number of homicides within each given district, the larger the point the more homicides occurred. Once finished, the plot should look like this.
Heat Map
Lastly, the Chief would like to visually show which areas are the most plagued by homicide in the city. You need to create a heat-map overlay on the provided map of Chicago. The heat-map should show the frequency of crimes in areas by overlying brighter colors onto that area, dependent on the number of crimes committed there. Once completed, the chart should look something like this. Hint: try using binhex within pyplot.
Once these graphs are completed, you should format them together in 2 or 3 figures, with relevant graphs being displayed together. If you think of any more interesting charts that could be created from the data, please include them in your submission, the Chief may reward your innovation.
Grading Rubric - Total 100 Points
- Efficient and flexible reading and parsing of data: 10 pts
- Bar plot of total yearly homicides: 5 pts
- Correct data extraction: 2 pts
- Labels and ticks: 3 pts
- Line chart of total yearly homicide differences: 10 pts
- Labels, ticks, and line: 7 pts
- Efficient calculation of difference: 3 pts
- Pie chart of most common homicide locations: 10 pts
- Correct extraction of data: 2 pts
- Correct grouping of OTHER: 5 pts
- Correct labeling of chart: 3 pts
- Scatter plot of homicides and arrests by district: 20 pts
- Appropriate axis labels and ticks: 8 pts
- Data is correctly extracted and displayed: 12 pts
- Heat map of homicide frequencies by location: 25 pts
- Extracting data: 7 pts
- Positioning data on the map: 9 pts
- Stylizing the heat-map to be representative: 9 pts
- Graph layout is coherent and representative: 10 pts
- Proper commenting and variable naming: 10 pts