[Stata] getcensus package for American Community Survey datasets

Exciting news for Stata nerd! There is a package for retrieving the American Community Survey datasets not only in R (tidycensus) but also Stata (getcensus), which was just launched in October 2021. Here are the steps to use it!

Get your own API key

First, you need to request your key to get the census dataset via Stata or R. It is effortless and does not take any time to wait for approval.

Here is the link for signup: Key Signup (census.gov)

As soon as signing up, you will receive an email like the above very shortly. You need to copy your key and do activate it.

Install packages and put your API key

ssc install getcensus
ssc install jsonio
global censuskey putyourkeyhere // after installation, you can load getcensus package by using this next time

These are the codes to install the packages when you use the getcensus package for the first time.

Browse the codebook

To browse the available variables, you can see them using the catalog.

* Browse catelog 
getcensus catalog, year(2021) sample(1) // you can change the year and sample number (1-year estimates, 3-year estimates, and 5-year estimates are available) 

Then let’s go to the Stata Data Editor (Browse).

It provides the list of table_id and variable_id with names. I recommend copying and pasting them to an excel sheet to browse with ease (Ctrl + A -> Ctrl + C on Stata -> Ctrl+V on an excel sheet).

You can also browse the variables on Explore Census Data website. Here is an example: S1701: POVERTY STATUS IN THE PAST 12… – Census Bureau Table

How to use state- and county-level variables

*get population by county: B01003 is total population 
getcensus B01003, year(2021) sample(1) geography(county) clear // you cannot get multiple tables at once 
getcensus B01001A_001 B01001A_017 B25047_001, year(2021) sample(1) geography(county) clear // you can get multiple variables at once 

If you figure out the table or variable you would love to download, let’s write the code to get it!

Put the id of the table or variable after getcensus command. For your information, you can not download multiple tables at once. However, you can download multiple variables in different tables at once. I recommend you download using variable_id instead of table_id if you need more than two variables across tables.

Then put the year of ACS dataset and the number of estimates (1, 3, or 5), geography unit (state, county, metro, tract, zcta, and so on).

You can see the list of supported geographies here: getcensus – Supported Geographies (centeronbudget.github.io)

How to use zip-code level variables

getcensus B01003, year(2020) sample(5) geography(zcta) clear 

Zip-code level variables are supported only in 5-year estimates. The latest is 2020 as of today.

I guess it has a glitch in retrieving the state name along with the zip code.

However, it still shows the correct numbers for the zip code.

How to match zip code with state, county, and city names

Then how can you include the state- and county- or city names into the tables we just downloaded at the zip-code level?

The nice blog by Edel Alon provides the excel sheet here: Zipcode to City, State Excel Spreadsheet • Edel Alon (updated 2020)

  1. You can download the Zip-Codes-to-City-County-State-2020 (xlsx) – Updated April 2020
  2. Convert excel sheet with zip-code into dta format for Stata
  3. Match zip-code with state, county, and city names using the merge command on Stata.
merge 1:1 zipcodetabulationarea(variable) using "zipcodematching data name.dta"

🔗Useful Links

GitHub – CenterOnBudget/getcensus: Load American Community Survey data from the U.S. Census Bureau API into Stata

Documentation of getcensus STATA package (centeronbudget.github.io)

Understanding Geographic Identifiers (GEOIDs) (census.gov)

🗺️How to graph the variable: using maptile and bimap package

I will post the details on how to use maptile and bimap package in the next post. Here is the spoiler from Todd Jones and Asjad Naqvi.

  • November 3, 2022