A new option to display Census API calls from tidycensus
Nov 18 2019

I think the tidycensus package is the easiest way to access U.S. Census data in R. Using familiar R syntax, you specify the variables and geography you want, and tidycensus pings the Census API and returns the estimates in a tidy data frame, with the option of including geographic data for easy mapping.

But it’s easy to run into errors when using tidycensus and not be sure of the source of the problem. Sometimes the Census API is down. Sometimes estimates are not available at the geographic level you requested. Sometimes there is a bug in the tidycensus source code. Sometimes the Census changes the API end points or variable names for certain years and not others!

So, that’s why I submitted a PR to make it easier to diagnose the problem by having tidycensus print the Census API call it makes. Just add show_call = TRUE to get_acs(), get_decennial(), or get_estimates(). The PR has been merged into the master branch of the dev version of tidycensus, so to try this out, install from GitHub with with remotes::install_github("walkerke/tidycensus/").

library(tidycensus)
library(magrittr)  # for the pipe

get_acs(
  geography = "county",
  state = "VT",
  variables = "B01003_001",
  show_call = TRUE
  ) %>% 
  head()
## Getting data from the 2013-2017 5-year ACS
## Census API call: https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50
## # A tibble: 6 x 5
##   GEOID NAME                       variable   estimate   moe
##   <chr> <chr>                      <chr>         <dbl> <dbl>
## 1 50001 Addison County, Vermont    B01003_001    36825    NA
## 2 50003 Bennington County, Vermont B01003_001    36054    NA
## 3 50005 Caledonia County, Vermont  B01003_001    30576    NA
## 4 50007 Chittenden County, Vermont B01003_001   160985    NA
## 5 50009 Essex County, Vermont      B01003_001     6203    NA
## 6 50011 Franklin County, Vermont   B01003_001    48816    NA

In second line of output, you see the call that tidycensus makes to the Census API (with your API key removed) to retrieve this data. You could copy this url into a web browser or check out the JSON response using httr.

httr::GET("https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50")
## Response [https://api.census.gov/data/2017/acs/acs5?get=B01003_001E%2CB01003_001M%2CNAME&for=county%3A%2A&in=state%3A50]
##   Date: 2019-11-17 02:06
##   Status: 200
##   Content-Type: application/json;charset=utf-8
##   Size: 921 B
## [["B01003_001E","B01003_001M","NAME","state","county"],
## ["59676","-555555555","Rutland County, Vermont","50","021"],
## ["26951","-555555555","Orleans County, Vermont","50","019"],
## ["6950","-555555555","Grand Isle County, Vermont","50","013"],
## ["28901","-555555555","Orange County, Vermont","50","017"],
## ["160985","-555555555","Chittenden County, Vermont","50","007"],
## ["25191","-555555555","Lamoille County, Vermont","50","015"],
## ["55485","-555555555","Windsor County, Vermont","50","027"],
## ["48816","-555555555","Franklin County, Vermont","50","011"],
## ["30576","-555555555","Caledonia County, Vermont","50","005"],
## ...

This isn’t immensely useful for a call that’s returned correctly, but how about when you get an error that is hard to interpret? Here, for example, is code from a tidycensus issue that was opened on GitHub that returns a confusing error message.

get_decennial(
  geography = "tract",
  variables = "H0050001",
  state = "WA",
  county = "Spokane",
  year = 2010
  )
## Getting data from the 2010 decennial Census
## Error : Your API call has errors.  The API message returned is <html><head><title>Error report</title></head><body><h1>HTTP Status 404 - /data/2010/dec/sf3</h1></body></html>.
## Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)), : unused argument (-NAME)

If you run this same code, but include show_call = TRUE, you get the API call leads to the error.

get_decennial(
  geography = "tract",
  variables = "H0050001",
  state = "WA",
  county = "Spokane",
  year = 2010,
  show_call = TRUE
  )
## Getting data from the 2010 decennial Census
## Census API call: https://api.census.gov/data/2010/dec/sf1?get=H0050001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063
## Error : Your API call has errors.  The API message returned is <html><head><title>Error report</title></head><body><h1>HTTP Status 404 - /data/2010/dec/sf3</h1></body></html>.
## Error in gather_(data, key_col = compat_as_lazy(enquo(key)), value_col = compat_as_lazy(enquo(value)), : unused argument (-NAME)

Next, check out this url in your browser (or httr if you don’t want to leave R).

httr::GET("https://api.census.gov/data/2010/dec/sf1?get=H0050001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063") %>% 
  httr::content()
## [1] "error: error: unknown variable 'H0050001'"

Aha! We asked for a variable (H0050001) the API can’t find. And, as Kyle notes in his response to the issue, some of the variable names in the API have changed, so the correct variable is now H005001.

get_decennial(
  geography = "tract",
  variables = "H005001",
  state = "WA",
  county = "Spokane",
  year = 2010,
  show_call = TRUE
  ) %>% 
  head()
## Getting data from the 2010 decennial Census
## Census API call: https://api.census.gov/data/2010/dec/sf1?get=H005001%2CNAME&for=tract%3A%2A&in=state%3A53%2Bcounty%3A063
## # A tibble: 6 x 4
##   GEOID       NAME                                       variable value
##   <chr>       <chr>                                      <chr>    <dbl>
## 1 53063000200 Census Tract 2, Spokane County, Washington H005001    177
## 2 53063000300 Census Tract 3, Spokane County, Washington H005001    133
## 3 53063000400 Census Tract 4, Spokane County, Washington H005001    112
## 4 53063000500 Census Tract 5, Spokane County, Washington H005001     68
## 5 53063000600 Census Tract 6, Spokane County, Washington H005001     55
## 6 53063000700 Census Tract 7, Spokane County, Washington H005001     99

I hope this small feature is useful for debugging tidycensus error messages and helping users to better understand the Census API.



comments powered by Disqus