Getting Census data from multiple states using tidycensus and purrr

Oct 7 2018

If you work with US Census data, you probably already know about the excellent tidycensus package. It provides simple and intuitive access to data from the American Community Survey and Decennial Census through the Census API.

There are some great vignettes that walk through the basic usage of tidycensus, but in this post I’ll demonstrate how to download Census data (and geometry!) for different geographies in multiple states. The inspiration for this post was this issue posted on the tidycensus GitHub repository.

If you want to download tract (or county) data for multiple states, you can call get_acs() as you would for a single state, but specify a vector of state abbreviations or FIPS codes to the state parameter. tidycensus takes care of making multiple calls to the Census API and combines the results for you.

library(tidyverse)
library(tidycensus)
library(sf)
library(tmap) # we'll use this later to make a map

my_states <- c("NH", "ME", "VT")

my_vars <- c(
  total_pop = "B01003_001",
  median_income = "B19013_001"
  )

multi_state_tract <- get_acs(
  geography = "tract",
  variables = my_vars,
  state = my_states,
  year = 2016,
  survey = "acs5",
  geometry = FALSE
  ) %>% 
  print()
## # A tibble: 1,674 x 5
##    GEOID     NAME                                 variable    estimate   moe
##    <chr>     <chr>                                <chr>          <dbl> <dbl>
##  1 33001965… Census Tract 9651, Belknap County, … total_pop       3280    19
##  2 33001965… Census Tract 9651, Belknap County, … median_inc…    73967  7959
##  3 33001965… Census Tract 9652, Belknap County, … total_pop       3196   228
##  4 33001965… Census Tract 9652, Belknap County, … median_inc…    52596 17474
##  5 33001965… Census Tract 9653, Belknap County, … total_pop       3158   225
##  6 33001965… Census Tract 9653, Belknap County, … median_inc…    69199  9189
##  7 33001965… Census Tract 9654, Belknap County, … total_pop       2976    27
##  8 33001965… Census Tract 9654, Belknap County, … median_inc…    72721  7287
##  9 33001965… Census Tract 9655.98, Belknap Count… total_pop       3567    18
## 10 33001965… Census Tract 9655.98, Belknap Count… median_inc…    51619  8567
## # … with 1,664 more rows

Similarly, if you want to download county subdivision data for all counties in a single state, you can just set the geography parameter to "county subdivision".

single_state_subdiv <- get_acs(
  geography = "county subdivision",
  variables = my_vars,
  state = "VT",
  year = 2016,
  survey = "acs5",
  geometry = FALSE
  ) %>% 
  print()
## # A tibble: 510 x 5
##    GEOID      NAME                               variable     estimate   moe
##    <chr>      <chr>                              <chr>           <dbl> <dbl>
##  1 5000100325 Addison town, Addison County, Ver… total_pop        1447   146
##  2 5000100325 Addison town, Addison County, Ver… median_inco…    77946  5322
##  3 5000108575 Bridport town, Addison County, Ve… total_pop        1226   183
##  4 5000108575 Bridport town, Addison County, Ve… median_inco…    63516  8194
##  5 5000109025 Bristol town, Addison County, Ver… total_pop        3907    16
##  6 5000109025 Bristol town, Addison County, Ver… median_inco…    53662  9153
##  7 5000116000 Cornwall town, Addison County, Ve… total_pop        1078   113
##  8 5000116000 Cornwall town, Addison County, Ve… median_inco…    76989  9870
##  9 5000126300 Ferrisburgh town, Addison County,… total_pop        2750    44
## 10 5000126300 Ferrisburgh town, Addison County,… median_inco…    65956  6950
## # … with 500 more rows

But, if you try to get county subdivision data for more than one state in a single get_acs() call, you get an error.

get_acs(
  geography = "county subdivision",
  variables = my_vars,
  state = my_states,
  year = 2016,
  survey = "acs5",
  geometry = FALSE
  )
## Getting data from the 2012-2016 5-year ACS
## Error: Your API call has errors.  The API message returned is error: CSV not allowed for ``state'' in geography heirarchy.

The solution to this problem is to call get_acs() for each state you need. One of the easiest way to setup a loop like this is by using the map family of functions from the purrr package. In this case, we’ll use map_dfr() to combine the results of calling get_acs() for each state into one data frame.

multi_state_subdiv <- map_dfr(
  my_states,
    ~ get_acs(
        geography = "county subdivision",
        variables = my_vars,
        state = .,
        year = 2016,
        survey = "acs5",
        geometry = FALSE
        )
    ) %>% 
  print()
## # A tibble: 2,096 x 5
##    GEOID     NAME                                 variable    estimate   moe
##    <chr>     <chr>                                <chr>          <dbl> <dbl>
##  1 33001010… Alton town, Belknap County, New Ham… total_pop       5285    16
##  2 33001010… Alton town, Belknap County, New Ham… median_inc…    76676 11670
##  3 33001032… Barnstead town, Belknap County, New… total_pop       4631    18
##  4 33001032… Barnstead town, Belknap County, New… median_inc…    70037  6457
##  5 33001047… Belmont town, Belknap County, New H… total_pop       7281    19
##  6 33001047… Belmont town, Belknap County, New H… median_inc…    60938  7012
##  7 33001106… Center Harbor town, Belknap County,… total_pop       1011   187
##  8 33001106… Center Harbor town, Belknap County,… median_inc…    70625 20060
##  9 33001287… Gilford town, Belknap County, New H… total_pop       7103    36
## 10 33001287… Gilford town, Belknap County, New H… median_inc…    63125 10372
## # … with 2,086 more rows

This approach works well for county subdivisions, but currently will fail if you try to get block group data. It will also fail when requesting county subdivisions from the Decennial Census (get_decennial()) instead of the American Community Survey. In both of these cases, we must specify the counties we want as well as the states to the Census API.

Conveniently, tidycensus provides a built-in table of counties and states along with their FIPS codes.

head(tidycensus::fips_codes)
##   state state_code state_name county_code         county
## 1    AL         01    Alabama         001 Autauga County
## 2    AL         01    Alabama         003 Baldwin County
## 3    AL         01    Alabama         005 Barbour County
## 4    AL         01    Alabama         007    Bibb County
## 5    AL         01    Alabama         009  Blount County
## 6    AL         01    Alabama         011 Bullock County

We can filter the fips_codes data frame to include only counties in our selected states, and then use map2_dfr() to loop over each of the state/county FIPS code combinations in this table. This will make a separate call to get_acs() and the Census API for each county in our three states.

my_counties <- fips_codes %>%
  filter(state %in% my_states)

multi_state_bg <- map2_dfr(
  my_counties$state_code, my_counties$county_code,
    ~ get_acs(
        geography = "block group",
        variables = my_vars,
        state = .x,
        county = .y,
        year = 2016,
        survey = "acs5",
        geometry = FALSE
        )
    ) %>% 
  print()
## # A tibble: 5,060 x 5
##    GEOID      NAME                                 variable   estimate   moe
##    <chr>      <chr>                                <chr>         <dbl> <dbl>
##  1 230010101… Block Group 1, Census Tract 101, An… total_pop       759   224
##  2 230010101… Block Group 1, Census Tract 101, An… median_in…    16553  2966
##  3 230010101… Block Group 2, Census Tract 101, An… total_pop       957   185
##  4 230010101… Block Group 2, Census Tract 101, An… median_in…    20048  5005
##  5 230010102… Block Group 1, Census Tract 102, An… total_pop       917   308
##  6 230010102… Block Group 1, Census Tract 102, An… median_in…    34271 32407
##  7 230010102… Block Group 2, Census Tract 102, An… total_pop      1242   433
##  8 230010102… Block Group 2, Census Tract 102, An… median_in…    38981 15574
##  9 230010102… Block Group 3, Census Tract 102, An… total_pop      2525   467
## 10 230010102… Block Group 3, Census Tract 102, An… median_in…    64278 20639
## # … with 5,050 more rows

Lastly, one of the great features of tidycensus is that it links with the tigris package to get simple feature geometries along with Census estimates. This makes it quite easy to make a choropleth map with Census data. So we’ll repeat our previous call and specify geometry = TRUE in get_acs().

But to do this for multiple states, we have to modify our previous approach because it is currently not possible to use bind_rows() with sf objects (map2_dfr() uses bind_rows() to combine the results of the loop). One way around this is to use map2() and then combine the list of data frames with the rbind.sf method.

multi_state_bg_geo_list <- map2(
  my_counties$state_code, my_counties$county_code,
    ~ get_acs(
        geography = "block group",
        variables = my_vars,
        state = .x,
        county = .y,
        year = 2016,
        survey = "acs5",
        geometry = TRUE,
        output = "wide"  # get data in wide format for easier mapping
        )
    )

multi_state_bg_geo <- reduce(multi_state_bg_geo_list, rbind) %>% 
  print()
## Simple feature collection with 2530 features and 6 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -73.43774 ymin: 42.69699 xmax: -66.9499 ymax: 47.45969
## epsg (SRID):    4269
## proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
##           GEOID                                                        NAME
## 1  230010101001 Block Group 1, Census Tract 101, Androscoggin County, Maine
## 2  230010101002 Block Group 2, Census Tract 101, Androscoggin County, Maine
## 3  230010102001 Block Group 1, Census Tract 102, Androscoggin County, Maine
## 4  230010102002 Block Group 2, Census Tract 102, Androscoggin County, Maine
## 5  230010102003 Block Group 3, Census Tract 102, Androscoggin County, Maine
## 6  230010103001 Block Group 1, Census Tract 103, Androscoggin County, Maine
## 7  230010103002 Block Group 2, Census Tract 103, Androscoggin County, Maine
## 8  230010104001 Block Group 1, Census Tract 104, Androscoggin County, Maine
## 9  230010104002 Block Group 2, Census Tract 104, Androscoggin County, Maine
## 10 230010105001 Block Group 1, Census Tract 105, Androscoggin County, Maine
##    total_popE total_popM median_incomeE median_incomeM
## 1         759        224          16553           2966
## 2         957        185          20048           5005
## 3         917        308          34271          32407
## 4        1242        433          38981          15574
## 5        2525        467          64278          20639
## 6        1033        259          36786          13312
## 7        1096        228          31504           6639
## 8        1801        235          66250          26003
## 9         607        181          46845          33415
## 10        926        202          35152           8685
##                          geometry
## 1  MULTIPOLYGON (((-70.23121 4...
## 2  MULTIPOLYGON (((-70.23687 4...
## 3  MULTIPOLYGON (((-70.23516 4...
## 4  MULTIPOLYGON (((-70.23305 4...
## 5  MULTIPOLYGON (((-70.29863 4...
## 6  MULTIPOLYGON (((-70.23377 4...
## 7  MULTIPOLYGON (((-70.23815 4...
## 8  MULTIPOLYGON (((-70.25513 4...
## 9  MULTIPOLYGON (((-70.24524 4...
## 10 MULTIPOLYGON (((-70.22536 4...

And now since we’ve got the data and geometry in an sf object, we might as well make a quick map using the tmap package!

# define a little helper function to format dollars for map
make_dollar <- function(x, digits = 0) {
  paste0("$", formatC(x, digits = digits, format = "f", big.mark = ","))
}

tmap_mode("view")
tm_shape(multi_state_bg_geo) +
  tm_fill(
    col = "median_incomeE",
    palette = "Greens",
    style = "jenks",
    contrast = c(0.3, 1),
    title = "Median HH Income",
    textNA = "Not Available",
    id = "NAME",
    popup.vars = c(
      "Median HH Income" = "median_incomeE",
      "Total Population" = "total_popE"
    ),
    popup.format = list(
      median_incomeE = list(fun = make_dollar),
      total_popE = list(format = "f", digits = 0)
      ),
    legend.format = list(fun = make_dollar)
  ) +
  tm_borders(col = "darkgray") +
  tm_view(
    alpha = 0.85,
    basemaps = "Stamen.TonerLite",
    view.legend.position = c("right", "bottom")
    )

Matt Herman