[Stata] getcensus package for American Community Survey datasets
Exciting news for Stata nerd! There is a package for retrieving the American Community Survey datasets not only in R (tidycensus) but also Stata (getcensus), which was just launched in October 2021. Here are the steps to use it!
Get your own API key
First, you need to request your key to get the census dataset via Stata or R. It is effortless and does not take any time to wait for approval.
Here is the link for signup: Key Signup (census.gov)
As soon as signing up, you will receive an email like the above very shortly. You need to copy your key and do activate it.
Install packages and put your API key
ssc install getcensus ssc install jsonio global censuskey putyourkeyhere // after installation, you can load getcensus package by using this next time
These are the codes to install the packages when you use the
getcensus package for the first time.
Browse the codebook
To browse the available variables, you can see them using the catalog.
* Browse catelog getcensus catalog, year(2021) sample(1) // you can change the year and sample number (1-year estimates, 3-year estimates, and 5-year estimates are available)
Then let’s go to the Stata Data Editor (Browse).
It provides the list of table_id and variable_id with names. I recommend copying and pasting them to an excel sheet to browse with ease (Ctrl + A -> Ctrl + C on Stata -> Ctrl+V on an excel sheet).
You can also browse the variables on Explore Census Data website. Here is an example: S1701: POVERTY STATUS IN THE PAST 12… – Census Bureau Table
How to use state- and county-level variables
*get population by county: B01003 is total population getcensus B01003, year(2021) sample(1) geography(county) clear // you cannot get multiple tables at once getcensus B01001A_001 B01001A_017 B25047_001, year(2021) sample(1) geography(county) clear // you can get multiple variables at once
If you figure out the table or variable you would love to download, let’s write the code to get it!
Put the id of the table or variable after
getcensus command. For your information, you can not download multiple tables at once. However, you can download multiple variables in different tables at once. I recommend you download using
variable_id instead of
table_id if you need more than two variables across tables.
Then put the year of ACS dataset and the number of estimates (1, 3, or 5), geography unit (
zcta, and so on).
You can see the list of supported geographies here: getcensus – Supported Geographies (centeronbudget.github.io)
How to use zip-code level variables
getcensus B01003, year(2020) sample(5) geography(zcta) clear
Zip-code level variables are supported only in 5-year estimates. The latest is 2020 as of today.
I guess it has a glitch in retrieving the state name along with the zip code.
However, it still shows the correct numbers for the zip code.
How to match zip code with state, county, and city names
Then how can you include the state- and county- or city names into the tables we just downloaded at the zip-code level?
The nice blog by Edel Alon provides the excel sheet here: Zipcode to City, State Excel Spreadsheet • Edel Alon (updated 2020)
- You can download the Zip-Codes-to-City-County-State-2020 (xlsx) – Updated April 2020
- Convert excel sheet with zip-code into
dtaformat for Stata
- Match zip-code with state, county, and city names using the
mergecommand on Stata.
merge 1:1 zipcodetabulationarea(variable) using "zipcodematching data name.dta"
Documentation of getcensus STATA package (centeronbudget.github.io)
🗺️How to graph the variable: using
I will post the details on how to use
bimap package in the next post. Here is the spoiler from Todd Jones and Asjad Naqvi.