Stata command: bfmcorr

Stata command for correcting survey data using tax data with the method described in:

The Weight of the Rich: Improving Surveys Using Tax Data

(with Thomas Blanchet and Ignacio Flores)

Installation and Documentation

Users should install the package via ssc by typing:

ssc install bfmcorr

For the description of the command, type help bfmcorr. For the description of postestimation tools, type help postbfm.

The sub-command bfmtoy allows users to parametrically simulate income distributions to see how bfmcorr works in practice. See help bfmtoy for instructions.

All the original .ado files can be accessed here (Stata may not always have the latest version when updates are made).


  • Reweights observations in the survey, enforces consistency with the tax data, and replaces observations at the top of the distribution to increase precision.

  • Automatically determines the “merging point” between the tax and the survey data, and extrapolates the shape of the nonresponse function if the tax data does not cover a large enough fraction of the distribution.

  • Can maintain the survey's representativeness in terms of other sociodemographic variables (age, gender, etc.)

  • Preserves survey microdata, including the household structure, assuming no reranking of observations.

  • Can use tax data with different statistical units (households or individuals).

  • Can work with two different income variables: a comprehensive income variable assumed to drive nonresponse in the survey, and a taxable income variable that corresponds to the tax data.

  • Can correct the survey to also match the composition of income by income bracket (eg. share of labor vs. capital) and/or the composition of the population by income bracket (eg. frequency of men vs. women).

  • Provides several diagnostic tools to analyse the correction.


See the file example for a complete illustration of the method with Brazilian data for 2014.

bfmcorr using "$dir_dta/gpinter-brazil-2014.xlsx", ///

weight(person_weight) income(yhh) households(hid) ///

taxu(i) trust(0.8) holdmargins(age_group male)

// Show the shape of the bias

postbfm biasplot

// Compare the Lorenz curves before and after correction

postbfm lorenz