I think the tidycensus package is the easiest way to access U.S. Census data in R. Using familiar R syntax, you specify the variables and geography you want, and tidycensus pings the Census API and returns the estimates in a tidy data frame, with the option of including geographic data for easy mapping.
But it’s easy to run into errors when using tidycensus and not be sure of the source of the problem. Sometimes the Census API is down. Sometimes estimates are not available at the geographic level you requested. Sometimes there is a bug in the tidycensus source code. Sometimes the Census changes the API end points or variable names for certain years and not others!
So, that’s why I submitted a PR to make it easier to diagnose the problem by having tidycensus print the Census API call it makes. Just add show_call = TRUE
to get_acs()
, get_decennial()
, or get_estimates()
. The PR has been merged into the master branch of the dev version of tidycensus, so to try this out, install from GitHub with with remotes::install_github("walkerke/tidycensus/")
.
library(tidycensus)
library(magrittr) # for the pipe
get_acs(
geography = "county",
state = "VT",
variables = "B01003_001",
show_call = TRUE
) %>%
head()
## Getting data from the 2013-2017 5-year ACS
## Census API call: https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50
## # A tibble: 6 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 50001 Addison County, Vermont B01003_001 36825 NA
## 2 50003 Bennington County, Vermont B01003_001 36054 NA
## 3 50005 Caledonia County, Vermont B01003_001 30576 NA
## 4 50007 Chittenden County, Vermont B01003_001 160985 NA
## 5 50009 Essex County, Vermont B01003_001 6203 NA
## 6 50011 Franklin County, Vermont B01003_001 48816 NA
In second line of output, you see the call that tidycensus makes to the Census API (with your API key removed) to retrieve this data. You could copy this url into a web browser or check out the JSON response using httr.
httr::GET("https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50")
## Response [https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50]
## Date: 2019-11-17 02:06
## Status: 200
## Content-Type: application/json;charset=utf-8
## Size: 921 B
## [["B01003_001E","B01003_001M","NAME","state","county"],
## ["59676","-555555555","Rutland County, Vermont","50","021"],
## ["26951","-555555555","Orleans County, Vermont","50","019"],
## ["6950","-555555555","Grand Isle County, Vermont","50","013"],
## ["28901","-555555555","Orange County, Vermont","50","017"],
## ["160985","-555555555","Chittenden County, Vermont","50","007"],
## ["25191","-555555555","Lamoille County, Vermont","50","015"],
## ["55485","-555555555","Windsor County, Vermont","50","027"],
## ["48816","-555555555","Franklin County, Vermont","50","011"],
## ["30576","-555555555","Caledonia County, Vermont","50","005"],
## ...
This isn’t immensely useful for a call that’s returned correctly, but how about when you get an error that is hard to interpret? Here, for example, is code from a tidycensus issue that was opened on GitHub that returns a confusing error message.
get_decennial(
geography = "tract",
variables = "H0050001",
state = "WA",
county = "Spokane",
year = 2010
)
## Getting data from the 2010 decennial Census
## Error : Your API call has errors. The API message returned is <html><head><title>Error report</title></head><body><h1>HTTP Status 404 - /data/2010/dec/sf3</h1></body></html>.
## Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)), : unused argument (-NAME)
If you run this same code, but include show_call = TRUE
, you get the API call leads to the error.
get_decennial(
geography = "tract",
variables = "H0050001",
state = "WA",
county = "Spokane",
year = 2010,
show_call = TRUE
)
## Getting data from the 2010 decennial Census
## Census API call: https://api.census.gov/data/2010/dec/sf1?get=H0050001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063
## Error : Your API call has errors. The API message returned is <html><head><title>Error report</title></head><body><h1>HTTP Status 404 - /data/2010/dec/sf3</h1></body></html>.
## Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)), : unused argument (-NAME)
Next, check out this url in your browser (or httr if you don’t want to leave R).
httr::GET("https://api.census.gov/data/2010/dec/sf1?get=H0050001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063") %>%
httr::content()
## [1] "error: error: unknown variable 'H0050001'"
Aha! We asked for a variable (H0050001
) the API can’t find. And, as Kyle notes in his response to the issue, some of the variable names in the API have changed, so the correct variable is now H005001
.
get_decennial(
geography = "tract",
variables = "H005001",
state = "WA",
county = "Spokane",
year = 2010,
show_call = TRUE
) %>%
head()
## Getting data from the 2010 decennial Census
## Census API call: https://api.census.gov/data/2010/dec/sf1?get=H005001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063
## # A tibble: 6 x 4
## GEOID NAME variable value
## <chr> <chr> <chr> <dbl>
## 1 53063000200 Census Tract 2, Spokane County, Washington H005001 177
## 2 53063000300 Census Tract 3, Spokane County, Washington H005001 133
## 3 53063000400 Census Tract 4, Spokane County, Washington H005001 112
## 4 53063000500 Census Tract 5, Spokane County, Washington H005001 68
## 5 53063000600 Census Tract 6, Spokane County, Washington H005001 55
## 6 53063000700 Census Tract 7, Spokane County, Washington H005001 99
I hope this small feature is useful for debugging tidycensus error messages and helping users to better understand the Census API.