Crimes in Somerville (Time Series)
January 9, 2018
I recently spotted a time series heat map. So I started on a quest to make my own. Luckily, I found some local data to start with: crimes in Somerville, MA. This data set includes selected crimes in Somerville from 2005-2017.
Handy Packages
library(tidyverse)
library(lubridate)
library(ggthemes)
library(plotly)
Getting the Data
somerville = read.csv("Police_-_Selected_Criminal_Incidents.csv")
Cleaning the Data
- Remove 2018 data since the year has just started.
- This part tackles converting the date column into a date data type. Then, I extracted specific parts of the date: the hour, the day of the week, the month, the year, and the day of the month.
somerville$dtreported = mdy_hms(somerville$dtreported)
somerville$wday = wday(somerville$dtreported , label=TRUE)
somerville$month = month(somerville$dtreported, label=TRUE) #pulls out the month.
somerville$hour = hour(somerville$dtreported) #pulls out the hour of the day.
somerville$dayomo = mday(somerville$dtreported) #pulls out the day of the month.
somerville$year = year(somerville$dtreported) #pulls out the year.
somerville = somerville %>% filter(year !=2018) #removes 2018 data.
Creating the Graph
- I used dplyr to to get a count by hour and day of the week.
- I felt fancy so I used ggplotly to make them interactive.
total = somerville %>% count(hour, wday) %>% #Getting count by hour and day of the week.
ggplot(aes(wday, hour, fill=n)) + geom_tile() + #create basic heatmap
scale_fill_gradient2(guide="legend") + theme_tufte() + #legend structure & tufte theme
theme(text=element_text(size=14), axis.text = element_text(size=14)) +
labs(x="Day of the Week", y="Hour", title="Crimes in Somerville (2005-2017)") + #labels
guides(fill=guide_legend(title="Crimes")) +
scale_y_continuous(breaks=seq(0, 23, 3)) #changing y axis
ggplotly(total)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
It looks like most crimes happen around 9 in the morning on Tuesdays!
But what if we look at the breakdown by type of crime. The data set provides an offense variable which has 4 types of crimes:
- Burglary/breaking and entering
- Motor vehicle theft
- Robbery
- Theft from motor vehicle
BURGLARY/BREAKING AND ENTERING
burg = somerville %>% filter(offense=="BURGLARY/BREAKING AND ENTERING") %>%
count(hour, wday) %>% #Getting count by hour and day of the week.
ggplot(aes(wday, hour, fill=n)) + geom_tile() + #create basic heatmap
scale_fill_gradient2(guide="legend") + theme_tufte() + #legend structure & tufte theme
theme(text=element_text(size=14), axis.text = element_text(size=14)) +
labs(x="Day of the Week", y="Hour", title="Burglary/Breaking and Entering") + #labels
guides(fill=guide_legend(title="Crimes")) +
scale_y_continuous(breaks=seq(0, 23, 3)) #changing y axis
ggplotly(burg)
It seems that most burglaries happen on Wednesday evenings.
MOTOR VEHICLE THEFT
carTheft = somerville %>% filter(offense=="MOTOR VEHICLE THEFT") %>%
count(hour, wday) %>% #Getting count by hour and day of the week.
ggplot(aes(wday, hour, fill=n)) + geom_tile() + #create basic heatmap
scale_fill_gradient2(guide="legend") + theme_tufte() + #legend structure & tufte theme
theme(text=element_text(size=14), axis.text = element_text(size=14)) +
labs(x="Day of the Week", y="Hour", title="Motor Vehicle Theft") + #labels
guides(fill=guide_legend(title="Crimes")) +
scale_y_continuous(breaks=seq(0, 23, 3)) #changing y axis
ggplotly(carTheft)
Car thefts seem to happen on Monday afternoons and around noon time on Thursdays.
ROBBERY
rob = somerville %>% filter(offense=="ROBBERY") %>%
count(hour, wday) %>% #Getting count by hour and day of the week.
ggplot(aes(wday, hour, fill=n)) + geom_tile() + #create basic heatmap
scale_fill_gradient2(guide="legend") + theme_tufte() + #legend structure & tufte theme
theme(text=element_text(size=14), axis.text = element_text(size=14)) +
labs(x="Day of the Week", y="Hour", title="Robbery") + #labels
guides(fill=guide_legend(title="Crimes")) +
scale_y_continuous(breaks=seq(0, 23, 3)) #changing y axis
ggplotly(rob)
This makes sense! Most robberies happen Saturday late at night/early in the morning.
THEFT FROM MOTOR VEHICLE
theft = somerville %>% filter(offense=="THEFT FROM MOTOR VEHICLE") %>%
count(hour, wday) %>% #Getting count by hour and day of the week.
ggplot(aes(wday, hour, fill=n)) + geom_tile() + #create basic heatmap
scale_fill_gradient2(guide="legend") + theme_tufte() + #legend structure & tufte theme
theme(text=element_text(size=14), axis.text = element_text(size=14)) +
labs(x="Day of the Week", y="Hour", title="Theft from Motor Vehicle") + #labels
guides(fill=guide_legend(title="Crimes")) +
scale_y_continuous(breaks=seq(0, 23, 3)) #changing y axis
#Making it interactive
ggplotly(theft)
It looks like most thefts from motor vehicles happen in the morning on weekdays.
There you have it a quick time series visualiation of crime in Somerville.