I’m reading the book Beautiful Data (edited by Toby Segaran and Jeff Hammerbacher), which is a collection of essays by people who work with data on “beautiful” data – data that tells a story, pretty ways to present data, smart ways to manipulate and analyze data, and the like. It’s a good book, though as with all edited collections some chapter are better than others.
But the beginning of one chapter jumped out at me, and I think it’s an observation that bears repeating:
(From Chapter 12: “The Design of Sense.us”, by Jeffrey Heer)
I must confess that I don’t believe in beautiful data. At least not without context.
Prior to World War II, the government of the Netherlands collected detailed civil records cataloguing the demographics of Dutch citizenry. A product of good intentions, this population register was collected to inform the administration of government services. After the German invasion, however, the same data was used to effectively target minority populations (croes 2006). Of the approximately 140,000 Jews that lived in the Netherlands prior to 1940, only about 35,000 survived.
Though perhaps extreme, for me this sobering tale underscores a fundamental insight: the “beauty” of data is determined by how it is used. Data holds the potential to improve understanding and inform decision-making for the better, thereby becoming “beautiful” in action. Achieving value from data requires that the right data be collected, protected, and made accessible and interpretable to the appropriate audience.
As a group, scientists are among the biggest producers of data on Earth (the biggest, depending on your definition of “scientist”), and though we sometimes like to believe that we are producing this data in a moral vacuum of pure knowledge-seeking, that is a thin fantasy stretched over a harsh reality. The reality is that our data has consequences, and it will be interpreted and used to make decisions and inform opinion whatever we think about the matter. From climate change, to evolution versus creation, to green power generation – we create the data that fuels the debates and policies that shape our world.
My words might be taken as a thinly-veiled suggestion to censor the data we produce, but it’s not possible to be further from the truth. The truth is that because we produce the data, we have the responsibility to interpret it, to make it beautiful for those who need to use it and understand it. We have to make sure that our data is not used in an ugly way. I’ll admit it: a call for scientists to be more active in making their data understandable is hardly novel. But like Jeffrey’s observation above, it bears repeating.