If you work with US Census data, you probably already know about the excellent tidycensus
package. It provides simple and intuitive access to data from the American Community Survey and Decennial Census through the Census API.
There are some great vignettes that walk through the basic usage of tidycensus
, but in this post I’ll demonstrate how to download Census data (and geometry!) for different geographies in multiple states. The inspiration for this post was this issue posted on the tidycensus
GitHub repository.
If you want to download tract (or county) data for multiple states, you can call get_acs()
as you would for a single state, but specify a vector of state abbreviations or FIPS codes to the state
parameter. tidycensus
takes care of making multiple calls to the Census API and combines the results for you.
library(tidyverse)
library(tidycensus)
library(sf)
library(tmap) # we'll use this later to make a map
my_states <- c("NH", "ME", "VT")
my_vars <- c(
total_pop = "B01003_001",
median_income = "B19013_001"
)
multi_state_tract <- get_acs(
geography = "tract",
variables = my_vars,
state = my_states,
year = 2016,
survey = "acs5",
geometry = FALSE
) %>%
print()
## # A tibble: 1,674 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 33001965… Census Tract 9651, Belknap County, … total_pop 3280 19
## 2 33001965… Census Tract 9651, Belknap County, … median_inc… 73967 7959
## 3 33001965… Census Tract 9652, Belknap County, … total_pop 3196 228
## 4 33001965… Census Tract 9652, Belknap County, … median_inc… 52596 17474
## 5 33001965… Census Tract 9653, Belknap County, … total_pop 3158 225
## 6 33001965… Census Tract 9653, Belknap County, … median_inc… 69199 9189
## 7 33001965… Census Tract 9654, Belknap County, … total_pop 2976 27
## 8 33001965… Census Tract 9654, Belknap County, … median_inc… 72721 7287
## 9 33001965… Census Tract 9655.98, Belknap Count… total_pop 3567 18
## 10 33001965… Census Tract 9655.98, Belknap Count… median_inc… 51619 8567
## # … with 1,664 more rows
Similarly, if you want to download county subdivision data for all counties in a single state, you can just set the geography
parameter to "county subdivision"
.
single_state_subdiv <- get_acs(
geography = "county subdivision",
variables = my_vars,
state = "VT",
year = 2016,
survey = "acs5",
geometry = FALSE
) %>%
print()
## # A tibble: 510 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 5000100325 Addison town, Addison County, Ver… total_pop 1447 146
## 2 5000100325 Addison town, Addison County, Ver… median_inco… 77946 5322
## 3 5000108575 Bridport town, Addison County, Ve… total_pop 1226 183
## 4 5000108575 Bridport town, Addison County, Ve… median_inco… 63516 8194
## 5 5000109025 Bristol town, Addison County, Ver… total_pop 3907 16
## 6 5000109025 Bristol town, Addison County, Ver… median_inco… 53662 9153
## 7 5000116000 Cornwall town, Addison County, Ve… total_pop 1078 113
## 8 5000116000 Cornwall town, Addison County, Ve… median_inco… 76989 9870
## 9 5000126300 Ferrisburgh town, Addison County,… total_pop 2750 44
## 10 5000126300 Ferrisburgh town, Addison County,… median_inco… 65956 6950
## # … with 500 more rows
But, if you try to get county subdivision data for more than one state in a single get_acs()
call, you get an error.
get_acs(
geography = "county subdivision",
variables = my_vars,
state = my_states,
year = 2016,
survey = "acs5",
geometry = FALSE
)
## Getting data from the 2012-2016 5-year ACS
## Error: Your API call has errors. The API message returned is error: CSV not allowed for ``state'' in geography heirarchy.
The solution to this problem is to call get_acs()
for each state you need. One of the easiest way to setup a loop like this is by using the map
family of functions from the purrr
package. In this case, we’ll use map_dfr()
to combine the results of calling get_acs()
for each state into one data frame.
multi_state_subdiv <- map_dfr(
my_states,
~ get_acs(
geography = "county subdivision",
variables = my_vars,
state = .,
year = 2016,
survey = "acs5",
geometry = FALSE
)
) %>%
print()
## # A tibble: 2,096 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 33001010… Alton town, Belknap County, New Ham… total_pop 5285 16
## 2 33001010… Alton town, Belknap County, New Ham… median_inc… 76676 11670
## 3 33001032… Barnstead town, Belknap County, New… total_pop 4631 18
## 4 33001032… Barnstead town, Belknap County, New… median_inc… 70037 6457
## 5 33001047… Belmont town, Belknap County, New H… total_pop 7281 19
## 6 33001047… Belmont town, Belknap County, New H… median_inc… 60938 7012
## 7 33001106… Center Harbor town, Belknap County,… total_pop 1011 187
## 8 33001106… Center Harbor town, Belknap County,… median_inc… 70625 20060
## 9 33001287… Gilford town, Belknap County, New H… total_pop 7103 36
## 10 33001287… Gilford town, Belknap County, New H… median_inc… 63125 10372
## # … with 2,086 more rows
This approach works well for county subdivisions, but currently will fail if you try to get block group data. It will also fail when requesting county subdivisions from the Decennial Census (get_decennial()
) instead of the American Community Survey. In both of these cases, we must specify the counties we want as well as the states to the Census API.
Conveniently, tidycensus
provides a built-in table of counties and states along with their FIPS codes.
head(tidycensus::fips_codes)
## state state_code state_name county_code county
## 1 AL 01 Alabama 001 Autauga County
## 2 AL 01 Alabama 003 Baldwin County
## 3 AL 01 Alabama 005 Barbour County
## 4 AL 01 Alabama 007 Bibb County
## 5 AL 01 Alabama 009 Blount County
## 6 AL 01 Alabama 011 Bullock County
We can filter the fips_codes
data frame to include only counties in our selected states, and then use map2_dfr()
to loop over each of the state/county FIPS code combinations in this table. This will make a separate call to get_acs()
and the Census API for each county in our three states.
my_counties <- fips_codes %>%
filter(state %in% my_states)
multi_state_bg <- map2_dfr(
my_counties$state_code, my_counties$county_code,
~ get_acs(
geography = "block group",
variables = my_vars,
state = .x,
county = .y,
year = 2016,
survey = "acs5",
geometry = FALSE
)
) %>%
print()
## # A tibble: 5,060 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 230010101… Block Group 1, Census Tract 101, An… total_pop 759 224
## 2 230010101… Block Group 1, Census Tract 101, An… median_in… 16553 2966
## 3 230010101… Block Group 2, Census Tract 101, An… total_pop 957 185
## 4 230010101… Block Group 2, Census Tract 101, An… median_in… 20048 5005
## 5 230010102… Block Group 1, Census Tract 102, An… total_pop 917 308
## 6 230010102… Block Group 1, Census Tract 102, An… median_in… 34271 32407
## 7 230010102… Block Group 2, Census Tract 102, An… total_pop 1242 433
## 8 230010102… Block Group 2, Census Tract 102, An… median_in… 38981 15574
## 9 230010102… Block Group 3, Census Tract 102, An… total_pop 2525 467
## 10 230010102… Block Group 3, Census Tract 102, An… median_in… 64278 20639
## # … with 5,050 more rows
Lastly, one of the great features of tidycensus
is that it links with the tigris
package to get simple feature geometries along with Census estimates. This makes it quite easy to make a choropleth map with Census data. So we’ll repeat our previous call and specify geometry = TRUE
in get_acs()
.
But to do this for multiple states, we have to modify our previous approach because it is currently not possible to use bind_rows()
with sf
objects (map2_dfr()
uses bind_rows()
to combine the results of the loop). One way around this is to use map2()
and then combine the list of data frames with the rbind.sf
method.
multi_state_bg_geo_list <- map2(
my_counties$state_code, my_counties$county_code,
~ get_acs(
geography = "block group",
variables = my_vars,
state = .x,
county = .y,
year = 2016,
survey = "acs5",
geometry = TRUE,
output = "wide" # get data in wide format for easier mapping
)
)
multi_state_bg_geo <- reduce(multi_state_bg_geo_list, rbind) %>%
print()
## Simple feature collection with 2530 features and 6 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -73.43774 ymin: 42.69699 xmax: -66.9499 ymax: 47.45969
## epsg (SRID): 4269
## proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## First 10 features:
## GEOID NAME
## 1 230010101001 Block Group 1, Census Tract 101, Androscoggin County, Maine
## 2 230010101002 Block Group 2, Census Tract 101, Androscoggin County, Maine
## 3 230010102001 Block Group 1, Census Tract 102, Androscoggin County, Maine
## 4 230010102002 Block Group 2, Census Tract 102, Androscoggin County, Maine
## 5 230010102003 Block Group 3, Census Tract 102, Androscoggin County, Maine
## 6 230010103001 Block Group 1, Census Tract 103, Androscoggin County, Maine
## 7 230010103002 Block Group 2, Census Tract 103, Androscoggin County, Maine
## 8 230010104001 Block Group 1, Census Tract 104, Androscoggin County, Maine
## 9 230010104002 Block Group 2, Census Tract 104, Androscoggin County, Maine
## 10 230010105001 Block Group 1, Census Tract 105, Androscoggin County, Maine
## total_popE total_popM median_incomeE median_incomeM
## 1 759 224 16553 2966
## 2 957 185 20048 5005
## 3 917 308 34271 32407
## 4 1242 433 38981 15574
## 5 2525 467 64278 20639
## 6 1033 259 36786 13312
## 7 1096 228 31504 6639
## 8 1801 235 66250 26003
## 9 607 181 46845 33415
## 10 926 202 35152 8685
## geometry
## 1 MULTIPOLYGON (((-70.23121 4...
## 2 MULTIPOLYGON (((-70.23687 4...
## 3 MULTIPOLYGON (((-70.23516 4...
## 4 MULTIPOLYGON (((-70.23305 4...
## 5 MULTIPOLYGON (((-70.29863 4...
## 6 MULTIPOLYGON (((-70.23377 4...
## 7 MULTIPOLYGON (((-70.23815 4...
## 8 MULTIPOLYGON (((-70.25513 4...
## 9 MULTIPOLYGON (((-70.24524 4...
## 10 MULTIPOLYGON (((-70.22536 4...
And now since we’ve got the data and geometry in an sf
object, we might as well make a quick map using the tmap
package!
# define a little helper function to format dollars for map
make_dollar <- function(x, digits = 0) {
paste0("$", formatC(x, digits = digits, format = "f", big.mark = ","))
}
tmap_mode("view")
tm_shape(multi_state_bg_geo) +
tm_fill(
col = "median_incomeE",
palette = "Greens",
style = "jenks",
contrast = c(0.3, 1),
title = "Median HH Income",
textNA = "Not Available",
id = "NAME",
popup.vars = c(
"Median HH Income" = "median_incomeE",
"Total Population" = "total_popE"
),
popup.format = list(
median_incomeE = list(fun = make_dollar),
total_popE = list(format = "f", digits = 0)
),
legend.format = list(fun = make_dollar)
) +
tm_borders(col = "darkgray") +
tm_view(
alpha = 0.85,
basemaps = "Stamen.TonerLite",
view.legend.position = c("right", "bottom")
)