Documentation
Overview
We digitize data from historical publications of the Internal Revenue Service (IRS). The IRS has a longstanding tradition of collecting detailed statistics for individuals and businesses going back to the Revenue Act of 1916, and the Statistics of Income (SOI) was first published in 1918 (with data for 1916).
Statistics of Income
Initially the SOI included only basic statistics on corporations, but over the years the section on corporations has become increasingly detailed, with more cross-tabulations and variables. In addition to data on receipts and net income, the SOI also contains data on balance sheets, which derives from (end-of-fiscal-year) balance sheets submitted by corporations with their tax returns. Using micro data from these submissions, the SOI provides tabulations of businesses by size of net income and sector since 1918 (which ended in the 1970s), by size of assets and sector since 1931, and by business receipts and sector since 1959. We use these size tabulations to study trends in corporate concentration over the long run. Corporate businesses in the tabulations by size include both C-corporations and S-corporations in the economy; the underlying data come from corporate tax returns (Form 1120 or 1120-S). For noncorporates (partnerships and nonfarm sole proprietorships), size bins are published in some years in separate SOI publications. We also transcribed these data whenever available.
The SOI publications are accompanied by the Corporation Source Book, which is a series of initially unpublished volumes containing tabulations with more detailed classifications compared to the published reports. The Corporation Source Book is digitally available through the IRS and the Electronic Records Division at the U.S. National Archives and Records Administration from 1964. The advantage of the Corporation Source Book data is that it includes more granular sector data and additional income and balance sheet items. We use the Corporation Source Book whenever available. The earliest SOI publications were based on the analysis of all submitted corporate tax returns. In later years, the SOI used estimates from sample data. Starting in 1951, the IRS began to use a stratified probability sample to provide estimates for the whole population. In these samples the IRS varied the sampling rate by size (measured using the size of total assets or the size of net income) to guarantee reliable totals. Accordingly, the sample usually included the universe of businesses in the top brackets. Therefore, the transition to sample data should not be accompanied by large effects on corporate concentration.