
Data are an ever-present part of our day-to-day lives. Think of all the data generated by your purchases (via credit card records), your travels (via smartphone GPS records), your health (via wearable devices), and your relationships (via social media). All these data used to just float off into the ether, but now they are collected, organized, and analyzed. More than a running log of your daily activities, your everyday data can reveal insights about who you are (see what Google thinks it knows about you based on your browsing history).
Of course, the everyday data of our lives aren’t exclusively commercial and cover more than our own individual activities. The government accumulates huge amounts of everyday data on everything from consumer complaints to solar flares from the sun.
Everyday data are also becoming more available and more accessible to more people. President Obama’s 2013 executive order increased open government data. New online learning platforms (often free), powerful open-source data analysis software, and low-cost computing power are putting data-driven insights within reach of anyone interested in grabbing them.
This is largely a good thing. Knowledge is most useful when it is shared openly. Open data has already been used to accomplish great things, like assessing air quality and improving health care options. The benefits of our everyday data can be amplified by connecting multiple datasets. Known as the “mosaic effect,” some insights are only apparent in the connections between diverse data sources, just like a mosaic image that is only recognizable from a larger combination of colors and shapes.
Combining everyday data—both individual and public—can provide tremendous value. George Mason University’s Risk-Needs-Responsivity Simulation Tool uses data on released prisoners’ individual needs and data on available services to make the optimal match.
But combining datasets can also produce less desirable outcomes. By connecting several open data sources and some ingenuity, one analyst identified precise addresses for patrons of a strip club in New York City.
As more of our everyday data are collected, shared, and analyzed, who bears the responsibility for the consequences?
Is it the responsibility of those collecting the data?
In the past, only particularly important everyday activities produced any form of data. You’d get a receipt for something you purchased. An application you completed would be kept on file. Most everyday activities and occurrences were done without any record. And, of course, without those records (i.e., data) there wasn’t any way to produce data-driven insights. Today, with new technologies collecting those data that previously went unrecorded, there are new opportunities to benefit from the insights contained in those data, but there are also new opportunities to suffer consequences that were simply impossible in the past because the data didn’t exist.
Is it the responsibility of those sharing the data?
There is an inherent balance between wanting to share as much data as possible and the responsible use and publication of that data. Perhaps no single entity feels this tension better than the federal government. With so much data and so many different agencies, opening the government’s data requires a delicate balance between an obligation to share their analytically useful data, while still adhering to federal laws, regulations, and policies regarding personal data. That is a pretty challenging assignment, and given all the creative ways data could be used, and how the possibilities increase exponentially when they are connected to other data, it’s basically impossible to anticipate all the ways data can be joined and analyzed to guard against misuse.
Is it the responsibility of those analyzing the data?
The credibility and livelihood of professional researchers relies on adherence to principles, guidelines, and explicit human subject protections to ensure the responsible data use. However, those who are now able to analyze data outside the field of professional research may not have that experience or feel that obligation. Three quarters of respondents interested in data issues said that data science training should include some component of ethics training. While it is encouraging that the majority of respondents thought this was an important issue, it is also revealing that so many did not.
So who is responsible for the consequences our everyday data may produce?
If responsibility means having some control over the outcome, then responsibility for our everyday data belong to everyone. By contributing our everyday data to the ecosystem, we all stand to benefit tremendously from the insights that are created; we also share the responsibility for the potential consequences our data may produce. Regulating data collection, sharing, or analysis may be part of taking responsibility, but this has already proved to be a troublingly blunt approach. We should each develop better understandings and controls over how the data we each produce contribute to each of these components of the ecosystem. In that way, as data producers, we can do a better job of embracing the benefits and mitigating the consequences by remaining conscious of our everyday data at its source, in our daily lives.