Income Along the Boston T [Orange Line]

December 12, 2017

As a native New Yorker, I was recently intrigued by a visualization of household income in NY highlight inequality ( New Yorker ). Having lived in Boston for going on 9 years, I’ve started to call this place home. Could I make a similar representation of income in Boston? That was my challenge!

PART I: The Orange Line

To start I used these packages:
1. acs: A simple way to pull census data.
2. httr: Let’s you work with URLs.
3. tidyverse: THE R package.
4. plotly: Interactive visualization in R.


I pulled together the names and locations (longitude/latitude) of the Orange Line stations. I needed to change the longitude and latitude to numerics for later use.

mbta = read.csv("mbta_orange.csv")
mbta$Longitude = as.numeric(mbta$Longitude)
mbta$Latitude = as.numeric(mbta$Latitude)

Next up, creating some empty columns to fill in down below.

mbta$tract = NaN
mbta$county = NaN
mbta$income = NaN

Census Tracts

Since I wanted to know the median household income by census tract, I needed to convert the T station coordinates to census tracts. The httr package allowed me to use the census geocoder website. The end of the code gets a little confusing due to the way the geocoding website returns the data. Note: Census tracts cover about 1,200 and 8,000 people

for(i in 1:nrow(mbta)) {
  x= as.character(mbta$Longitude[i])
  y = as.character(mbta$Latitude[i])
  url = sprintf("", x, y)
  r= GET(url)
  g = content(r, "parsed")
  mbta$tract[i]= as.character(g$result$geographies$`Census Tracts`[[1]][17]$TRACT[[1]])
  mbta$county[i] = as.character(g$result$geographies$Counties[[1]]$NAME[[1]])

Median Household Income

With the census tracts for each Orange line T stop in hand, I used the acs package to get the income data. I created a function to do just that, then used the mapply function to repeat it for each station.

tractIncome = function(tract,county) {
  my.tract=geo.make(state="MA", county=county, tract=tract)
  median.income=acs.fetch(geo=my.tract, endyear=2015, ="B19013")

mbta$income = mapply(tractIncome, mbta$tract, mbta$county)


After a few trials, I realized I needed to change the factor levels of the station names to display them in the order of the stops. With that sorted out, it was on to using plotly to create the interactive graph. Note that the plotly function has an underscore. This took me a while to figure out.

So there you have it, median household income along Boston’s Orange line.

mbta$Station_Name = as.character(mbta$Station_Name)
mbta$Station_Name = factor(mbta$Station_Name, levels=unique(mbta$Station_Name))

plot_ly(mbta, x=~Station_Name, hoverinfo="none") %>%
  add_trace(y=~Income, type="scatter", mode="line", line = list(color="orange", width=6)) %>%
  add_trace(y=~Income, type="scatter", mode="markers", marker=list(color="black", size=8), hoverinfo= 'text',
        text = ~paste('</br><b>Station Name:</b>', Station_Name,
                      '</br> Income:', Income,
                      '</br> Tract:', tract)) %>%
  layout(title="Median Household Income Along the Orange Line", showlegend=FALSE, xaxis=list(title="", tickangle=-90), yaxis=list(title="", range=c(0, 200000)), margin=list(b=160))

Sadly, plotly does not support hover on mobile. So, if you want to see more info try hovering over the markers on your laptop.

In Part II, we will take a look at the The Red Line.