Where are the students who are English Learners in MA?

April 30, 2018

Inspired by my wife’s work in educating students who are English Learners, this post visualizes the English Learners across Massachusetts.

The Massachusetts Department of Education publishes a large amount of education data from across the state. It provides yearly percentages of enrolled student and their characteristics, including English Learners. English Learners are defined as students age 3-21 who were not born in the US or whose native language is not English. They suffer educational disparities and have higher dropout rates than their English speaking counterparts.

With that introduction in mind, lets get to it.

The packages:

library(tidyverse) # a classic
library(viridis) # for color palettes 
library(sf) #to read a display shapefiles that create the map
library(leaflet) # to make interactive map

The data:


The map file comes from Mass GIS. The education data from the Massachusetts Department of Education The district_sf variable contains the shapefile for the map. The elma variable contains the data on English Learners in each district.

district_sf = st_read("schooldistricts/SCHOOLDISTRICTS_POLY.shp")
## Reading layer `SCHOOLDISTRICTS_POLY' from data source `C:\Users\jrodr\Documents\RBlog\transMed\content\post\schooldistricts\SCHOOLDISTRICTS_POLY.shp' using driver `ESRI Shapefile'
## Simple feature collection with 403 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 36118.83 ymin: 777606.4 xmax: 330837 ymax: 959743
## epsg (SRID):    NA
## proj4string:    +proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs
elma = read.csv("selectedpopulations.xls.csv", stringsAsFactors = F, header = T)
elma = elma[-1,] #removes an empty row

The wrangling:

Step 1

There were a few variable types that needed to be changed. The paste0 creates a new variable with a leading zero for district code to help in merging data sets later on.

district_sf$ORG_CODE = as.character(district_sf$ORG_CODE) 
elma$ORGCODE = as.character(elma$ORGCODE)

district_sf$DISTRICT = as.character(district_sf$DISTRICT)
elma$ï..DISTRICT = as.character(elma$ï..DISTRICT)

elma$ORGCODE_E = paste0("0", elma$ORGCODE)

Step 2

This was by far the trickiest step! I use a series of left joins to join the data from the MA DOE to the district level shapefile. The challenge is was that after each left join I had a column with some rows with NAs. In order to combine the columns and fill in the NAs, I used the mutate step used the coaslece function. My understanding is that the coaslece function combines columns and replaces rows with NA values with from other columns that have a value listed for that row.

district_sf = district_sf %>% left_join(elma, by=c("ORG_CODE" = "ORGCODE")) %>%
  left_join(elma, by=c("ORG_CODE" = "ORGCODE_E")) %>%
  left_join(elma, by = c("DISTRICT"="ï..DISTRICT")) %>%
  mutate(XEL1 = coalesce(X.1.x, X.1.y, X.1))

district_sf$XEL1 = as.numeric(district_sf$XEL1)

Step 3

I removed the charter districts since they covered large areas and were different than the rest of the districts listed.

district_sf = district_sf %>% filter(MADISTTYPE != "Charter - Commonwealth" & MADISTTYPE != "Charter - Horace Mann")

The map:

The map represents the percentage of EL students in each district presented as percentiles.

pal = colorQuantile(palette = "viridis", domain = as.numeric(district_sf$XEL1), n = 10)

mapel = district_sf %>%
  st_transform(crs = '+proj=longlat +datum=WGS84') %>% 
  leaflet(width = "100%") %>%
  addProviderTiles(provider = "CartoDB.Positron") %>%
  addPolygons(popup = ~ paste0(DISTRICT, ":", XEL1, "%"),
              stroke = FALSE,
              smoothFactor = 0,
              weight = 2,
              fillOpacity = 0.7, 
              color = ~ pal(XEL1)) %>% 
            pal = pal, 
            values = ~ XEL1,
            title = "% English Learners in MA (Percentiles)",
            opacity = 1)