[Stata] Chi-square test & Post-hoc analysis using tab and tabchi commands

Copyright: UCLA IDRE

The chi-square test is an analysis used when both the independent and dependent variables are categorical variables.

In STATA, the relationship between the two categorical variables can be checked through cross-tabulation, and the chi-square test can be conveniently performed as an option here.

tab y x, chi column 

As above, after the tab command, list two categorical variables and put “chi” as an option to perform the chi-square test. Here, categorical variables also include binary variables.

Here, x is the independent variable and y is the dependent variable. For example, let’s say I want to analyze the difference in nativity according to sub-ethnic groups in the dataset. Here, the sub-ethnic group is defined as a categorical variable with 1=East Asian, 2=South Asian, and 3=Southeast Asian. Meanwhile, nativity was defined as a binary variable: 0=Not born in the US, 1=Born in the US. Therefore, it is appropriate to perform the chi-square test at this time.

tab usborn eth_cat, chi column

eth_cat is named to refer to that each variable is made into a categorical variable. If you add an column option after the chi, the percentage based on the column is also provided.

According to the results, it can be concluded that there is a significant difference in the ratio of nativity according to the sub-ethnic group at the significance level of 99% with p=0.006 (p<0.01).

Post-hoc analysis

ssc install tab_chi

You may want to explore specifically how the differences are significant within subgroups here. This is called post hoc analysis. Since STATA does not provide this as a default option, you must use a user-created package called tab_chi (developed by Nicholas Cox).

tabchi usborn eth_cat, adj noo noe

Afterward, posthoc analysis for the chi-square test can be performed through the above syntax. Here, the adj option is an abbreviation for adjusted, which means “adjusted residuals, Pearson residuals divided by an estimate of their standard error”.

As the p-value is not provided here, you need to interpret the significance by using “Z-distribution”.

You can interpret the significance level through the Z-score distribution table above. If the test statistic is greater than or equal to 1.96 or less than -1.96, it is significant at the p<0.05 level. In the same vein, greater than 2.576 or less than -2.576 are significant at the p<0.01 level.

In the result table, the absolute value of the adjusted residual in East Asian cells is 2.576 or more, and in South Asian cells it is 1.96 or more. The adjusted residuals can be reported as shown in the table above. Significance means “they are more extreme than what would be expected if the null hypothesis of independence was true”.

Please refer to the below two papers for more details on how to report in the tables and how to interpret them.

Bisdorff, B., Schauer, B., Taylor, N., Rodríguez-Prieto, V., Comin, A., Brouwer, A., … & Häsler, B. (2017). Active animal health surveillance in European Union Member States: gaps and opportunitiesEpidemiology & Infection145(4), 802-817.

To examine the relationship between the provider-trained group and scoring on the knowledge items, chi-square tests were conducted to test for independence of the accuracy of responses to knowledge questions and whether a respondent had received training or counseling in FABM from a provider.

Chi-square tests were then used to test for independence of the groupings and how users rated attraction and functionality and ranked types of evidence. The adjusted residuals were calculated to determine where the largest differences between observed and expected counts arose, while accounting for sample size of each of the three respondent groups. Statistical analyses were conducted in Stata v. 15 (StataCorp LLC; College Station, TX).

You may also like...

Leave a Reply