Unlocking the potential of private data for public good
When we shop online or share content on social media, we’re contributing to today’s explosion of data.
Most of these data assets are collected, stored, and analyzed by private businesses—and some of those companies are exploring how this wealth of assets can serve the public good, a concept known as data philanthropy.
Data originally collected for private purposes can be more applicable to research and public policy than many realize. Private data have filled knowledge gaps in food insecurity, disease tracking, climate change, and countless other issues. Researchers are constantly finding innovative ways to answer public questions with private data.
But for businesses, engaging in data philanthropy requires due consideration of their goals, legal obligations, and needs to determine the best way to share their data.
In a new report, we found that private data providers use five primary pathways to share their data assets. Each pathway varies in the level of access it grants people and organizations outside of the business, from no outside access to publicly available datasets.
As an example of pathway one, researchers at the Mastercard Center for Inclusive Growth used Mastercard’s anonymized and aggregated transaction data to produce the Donation Insights report, providing fresh insights into philanthropic giving not available to analysts through existing data sources.
On the other end of the spectrum–pathway five–Twitter provides a real-time sample of all public Tweets to the general public via its streaming application programming interface, or API. And some providers pursue multiple pathways simultaneously – for example, along with the Donation Insights report the Mastercard Center for Inclusive Growth publishes the underlying charitable donation dataset based on anonymized and aggregated Mastercard transaction data, which users can download to dig deeper and customize their analysis.
Choosing the right pathway
Businesses have diverse options for sharing their data while ensuring they are also safeguarding privacy and protecting their assets.
To identify the right pathway, would-be data providers should consider:
- Protecting privacy and ensuring data security: Do the data contain personally identifiable information, and if so, have people provided consent? Are appropriate data security measures in place?
- Minimizing transaction costs: Does the data provider have a trusted relationship with a data analyst, established legal language for data sharing, and systems for reviewing data for privacy concerns, including technical systems to clean, anonymize, and share the data?
- Understanding the data and metadata: Does the data provider fully understand the potential biases in the data, such as coverage rate, demographic characteristics, and other data-specific characteristics? Does it have well-documented codebooks and data collection procedures it can share with outside parties?
- Mitigating reputational risks: Have the data providers and data analysts disclosed potential reputational risks to one another, such as editorial review processes, potentially harmful research topics, and disclosure rules? And does the data provider have material or training to educate data analysts on required language and relevant data restrictions?
Businesses that are just dipping their toes into data philanthropy, have highly sensitive data, or lack the technical capacity to ensure proper public use of their data may want to consider a more restrictive pathway. A more accessible pathway might be appropriate for companies with more robust data science infrastructures and less sensitive data.
Sharing data without giving it away
The five pathways highlight an important point: data providers do not necessarily have to make data public to provide public value. Programs like the Google Visiting Faculty Program allow researchers to visit Google, collaborate with Google researchers, tap into Google’s resources, and publish new findings—all without sensitive data ever leaving Google’s campus.
And as data providers innovate to keep personal data private while providing public value, they sometimes combine different pathways. Facebook recently announced it will be contributing data to researchers to study “the effects of social media on democracy and elections” by creating anonymous “synthetic” data. Researchers granted access to the synthetic data can analyze the data and then submit completed code for Facebook to run against the original underlying data to produce publishable findings, a clever combination of pathways two and four.
Though there are certainly challenges ahead, the nascent but rapidly evolving field of data philanthropy has the potential to inform solutions to social problems that have puzzled researchers for generations. Data providers and data consumers alike have much to gain by unlocking the power of private data for public good.
Illustration by Sorbetto/Getty Images.