21 Days to Data: Crime in NYC (Python / Tableau)
- Brandon Hopkins
- Sep 1, 2022
- 3 min read
Updated: Jan 20, 2023
Introduction
The purpose of this project was to analyze New York City crime data to better inform the “Commissioner” on the crime taking place across the 5 boroughs and aid in reducing crime. This project was part of the 21 Days to Data Challenge, a 21-day course on a variety of data concepts and tools. I learned a lot by working with a real-world dataset and got a chance to practice skills in Python, data cleaning, and data visualization.
The Data
The data used for this project was a subset of NYC complaint data from Kaggle, with the original (much larger) dataset coming from NYC Open Data.
After downloading the data from Kaggle, I uploaded the .csv into OpenRefine, an open-source data cleaning tool. In OpenRefine I fixed any errors in the data, including mis-spelled or inconsistent offense description, incorrect date formatting, and incorrect age numbering. After wrangling the data, it was now time for analysis!
Analysis
My analysis was done using Python and Tableau. To create the plots in Python, I used a notebook in Google Colab – this was my first ever experience with Python and was a lot of fun! I imported the pandas, seaborn, and matplotlib libraries and wrote the code below. The output was two visuals: the first, a bar chart displaying the total number of crimes by Borough and the second, a scatter plot mapping the category of crime by location. Check out the code and outputs below!





In addition to my work in Python, I also created a dashboard using Tableau (see below) - you can also access the interactive dashboard here. The top section provides the user with a breakdown of crime sorted by the top occurring offenses and by borough, while the bottom portion gives statistics for both the victims and suspects.

The data shows that the most crime occurs in Brooklyn and Manhattan, and crime spikes in these locations between 11AM-12PM. It is also clear that an overwhelming majority of suspects are male, age 25-44, and there is a significantly higher number of suspects that are Black compared to any other race. With this knowledge, I would suggest to the Commissioner that patrols should be increased in Brooklyn and Manhattan during the time with high crime – a police presence may reduce the number of incidents. Another solution that I would suggest exploring a long-term outreach program that targets young men, especially young Black men, as this is the group committing the most crime. If the NYPD can identify why this group commits a high number of crime (mental health, socioeconomic status, education, etc), then perhaps programs could be put in place to provide support and hopefully reduce long-term crime numbers.
Conclusion
I learned a great deal and had a lot of fun working on this project! Prior to this, I had never done any work with Python so that was a great first step and has inspired me to continue learning this powerful tool.
I hope to improve on this project in the future as I continue to learn. Especially for the analysis done in Python, I would love to format the visuals better – specifically making the scatter plot more interactive by adding in a zoom function and a map of New York City as the plot background which would make it more valuable.
I’d love to hear any feedback, suggestions, or questions you might have so feel free to reach out anytime! You can find me on Linkedin or learn more about me and see other projects I’ve worked on here. Thank you!
Comments