Descriptive Data Analysis

Project

Quantitative Data Analysis

Project Navigation

Project Home

Microsimulation

The Analysis of Transfers, Taxes, and Income Security (ATTIS) microsimulation model

The Medicare Policy Microsimulation Model (MCARE-SIM)

The Model of Income in the Near Term (MINT)

The Tax Policy Center Microsimulation Model

The Dynamic Simulation of Income Model (DYNASIM)

The Health Insurance Policy Simulation Model (HIPSM)

The Transfer Income Model (TRIM)

Descriptive Data Analysis

Inference

Impact Analysis

Bias

Experiments

Paired Testing

Quasi-experimental Methods

Difference-in-Difference and Panel Methods

Instrumental Variables

Propensity Score Matching

Regression Discontinuity

Regression Techniques

Generalized Linear Model

Linear Regression

Logit and Probit Regression

Segregation Measures

Inequality Measures

Decomposition Methods

Performance Measurement and Management

Descriptive Data Analysis

Descriptive techniques often include constructing tables of means and quantiles, measures of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs" that can be used to examine many disparate hypotheses. Those hypotheses are often about observed differences across subgroups. Specialized descriptive techniques are used to measure segregation, discrimination, and inequality. Discrimination is often measured using audit studies or decomposition methods. More segregation by type or inequality of outcomes need not be wholly good or bad in itself, but it is often considered a marker of unfair social processes; accurate measurement of the levels across time and space is a prerequisite to understanding those processes.

A table of means by subgroup can show important differences across subgroups, and this kind of descriptive analysis often invites causal inference. When we see a gap in earnings, for example, we naturally want to extrapolate reasons those patterns exist. But this enters the province of measuring impacts, and different techniques are needed. Often, means differ merely because of random variation, and statistical inference is needed to determine whether observed differences could stem merely from chance.

A crosstab or two-way tabulation shows the proportions of units with distinct values for each of two variables, or cell proportions. For example, we might ask what proportion of the population has a high school degree and receives food or cash assistance, which requires a crosstab of education versus receipt of assistance. Then we might also examine row proportions, or the fractions in each education group who receive assistance, perhaps seeing assistance levels sharply lower at higher education levels.

We could also look at column proportions, for the fraction of recipients with different levels of education, but this is the opposite direction from any causal effects. We might see a surprisingly high number or proportion of recipients with a college education, but this might be a result of larger numbers of college graduates than people with less than a high school degree (the column proportions of the total population without regard to receipt of assistance).