Urban Wire We Need Better Zoning Data. Data Science Can Help.
Erika Tyagi, Graham MacDonald
Display Date

Construction workers put up scaffolding in Washington, DC on October 8, 2019

Exclusionary zoning policies restrict housing supply, drive up costs, and perpetuate the racial and economic disparities that lead to unequal access to opportunity. Although zoning touches a wide range of critical policy issues, we remain relatively uninformed about its basic building blocks and their impacts.

To inform better zoning policies, we need better data. Data science can help unlock these data, as our work at Urban shows.

Last summer, we used machine learning to predict zoning density limits in Washington, DC. This summer, we conducted follow-on research to determine whether this approach would apply to other jurisdictions. Our results are promising, but more work remains to demystify zoning.

Why don’t we have better zoning data?

The authority for zoning and land-use regulation in the United States rests largely with local governments. Because these regulations are mostly under local control, so are the data describing them. These data are often locked away in the complex text descriptions of zoning ordinances. Parsing these documents and interpreting zoning codes—actually understanding the nature of what, where, and how much can be built in an individual zone—requires significant time and background knowledge.

As a result, no current, comprehensive, and comparable national database of zoning limits exists.

How can we use machine learning to predict zoning limits?

Our machine-learning approach attempts to use information about zones’ underlying properties to predict permitted density limits. This approach has the potential to create a first-of-its-kind national database of zoning restrictions, empowering researchers, policymakers, and communities.

Machine learning involves feeding data and answers to a model, letting the model learn a set of rules to infer those answers from those data, and then applying those rules to unseen data.

Here, our data are characteristics of properties in a zone, such as the size of the average lot or the most recent year in which residences were built. Our answers are the permitted density limits for each zone, manually coded from local zoning ordinances.

We then asked two questions:

Can a model learn rules to predict density limits from property assessment data?

How well do the rules from one jurisdiction apply to other jurisdictions?

The answer to the first question is yes. Our model learned rules that predict density limits fairly well within a jurisdiction (if you’re interested, you can read our full technical appendix). When our seen and unseen data are both zones in Washington, DC, our predicted limits are really close to the actual allowed density limits across the city.

This first result is promising. We fed our model data from fewer than a hundred zones, and it produced a set of rules to accurately predict density limits.

But to create a national dataset with this approach, we’ll need to understand how our model performs when our unseen data don’t come from the same jurisdiction as our seen data (the crux of our second question). By understanding the types of zones that our model can’t yet accurately predict, we can refine our approach with these gaps in mind.

To explore the second question, we used two cases outside of Washington, DC. One model uses data from DC and Arlington County to predict densities in Montgomery County, while another uses data from DC and Montgomery County to predict densities in Arlington County.

In both cases, we can generally predict zones with a large number of residential properties and zones that “look like” zones in DC.

But in zones the model has little experience with, we observe more error. In Montgomery County, for example, our model predicts higher densities in rural zones than actually exist—in this case, it predicted more than twice the actual densities in the county’s Agricultural Reserve and Rural Cluster zones.

Similarly, our model predicts higher densities in Arlington County’s commercial and mixed-use zones than exist in the zoning code. Take the Crystal City and Pentagon City neighborhoods, where our model predicts densities in the C-O-1.5 and C-O-2.5 zones that are more than triple their actual densities.

Nonetheless, our results are directionally accurate. Our model correctly predicts lower-density, single-family residential zones to have lower floor area ratios (FARs) than higher-density, multiple-family zones—even if we can’t yet predict zones’ precise FARs in areas where the model lacks sufficient experience from existing data.

What are our next steps?

Our approach has promise. But to move forward on a broader scale, we need help.

First, we need more local jurisdictions to release open and structured geospatial zoning data. Ideally, these data would also contain relevant zoning restrictions. At the very least, we need data on where zoning boundaries lie within municipalities for our machine learning process.

Second, we need to feed our model more labelled data—or data with answers. We’ll need to continue manually coding the permitted density limits from zoning ordinances in areas where our model currently performs poorly.

Our model tended to predict higher densities in Montgomery County’s rural areas and Arlington’s commercial areas than were specified in the code. Feeding our model information from more diverse zones will likely help it make predictions as it encounters new built environments.

Our model also tended to predict higher densities than are allowed in the zoning code in areas where we believe developers may take advantage of optional methods to achieve higher density limits. By better understanding the zoning-to-development process—and finding ways to capture this nuance in a generalizable way—our approach can produce a more accurate and meaningful picture of zoning data.

Communities are realizing that reforming their local zoning policies is necessary to addressing housing affordability. By using innovative approaches to unlock zoning data, we hope to build the evidence needed to inform policies that create stronger and more equitable communities.


Tune in and subscribe today.

The Urban Institute podcast, Evidence in Action, inspires changemakers to lead with evidence and act with equity. Cohosted by Urban President Sarah Rosen Wartell and Executive Vice President Kimberlyn Leary, every episode features in-depth discussions with experts and leaders on topics ranging from how to advance equity, to designing innovative solutions that achieve community impact, to what it means to practice evidence-based leadership.


Research Areas Greater DC Housing Neighborhoods, cities, and metros