Beautiful data isn’t always beautiful.

I’m reading the book Beautiful Data (edited by Toby Segaran and Jeff Hammerbacher), which is a collection of essays by people who work with data on “beautiful” data – data that tells a story, pretty ways to present data, smart ways to manipulate and analyze data, and the like. It’s a good book, though as with all edited collections some chapter are better than others.

But the beginning of one chapter jumped out at me, and I think it’s an observation that bears repeating:

(From Chapter 12: “The Design of”, by Jeffrey Heer)

I must confess that I don’t believe in beautiful data. At least not without context.

Prior to World War II, the government of the Netherlands collected detailed civil records cataloguing the demographics of Dutch citizenry. A product of good intentions, this population register was collected to inform the administration of government services. After the German invasion, however, the same data was used to effectively target minority populations (croes 2006). Of the approximately 140,000 Jews that lived in the Netherlands prior to 1940, only about 35,000 survived.

Though perhaps extreme, for me this sobering tale underscores a fundamental insight: the “beauty” of data is determined by how it is used. Data holds the potential to improve understanding and inform decision-making for the better, thereby becoming “beautiful” in action. Achieving value from data requires that the right data be collected, protected, and made accessible and interpretable to the appropriate audience.

As a group, scientists are among the biggest producers of data on Earth (the biggest, depending on your definition of “scientist”), and though we sometimes like to believe that we are producing this data in a moral vacuum of pure knowledge-seeking, that is a thin fantasy stretched over a harsh reality. The reality is that our data has consequences, and it will be interpreted and used to make decisions and inform opinion whatever we think about the matter. From climate change, to evolution versus creation, to green power generation – we create the data that fuels the debates and policies that shape our world.

My words might be taken as a thinly-veiled suggestion to censor the data we produce, but it’s not possible to be further from the truth. The truth is that because we produce the data, we have the responsibility to interpret it, to make it beautiful for those who need to use it and understand it. We have to make sure that our data is not used in an ugly way. I’ll admit it: a call for scientists to be more active in making their data understandable is hardly novel. But like Jeffrey’s observation above, it bears repeating.

I’m in the final few months of my Ph.D., which means that I’ve been at this nearly four years. If you count the two years for my M.Sc. Before that, it’s a total of six years as a postgraduate student. Being a student is, in many ways, a fairly straight-forward affair. You (should!) have a clearly defined role, a sense of where you fit in the hierarchy. As a modeller in a lab of empiricists I’ve worked on my own a fair bit, but I’ve also been the go-to for a couple of collaborations now; however, whether I was setting my own project or working on someone else’s, the goals were fairly clear and the work was well-delineated. I got to work hard on something I loved, and I did most of the work myself. It was great: I produced the work, we talked about it, I helped write the paper, my supervisor approved it, we shipped it out the door. Rinse, repeat.

In the past couple of months, though, I’ve noticed a shift which has left me a little off-balance. It started with a side-project with my M.Sc. advisor, where I ended up pitching in on a paper that he was working on with a student of his. For various reasons, I ended up in something of a liaison position, where I spent time working with his student before we took what we had to the big kahuna. It was hardly a major shift – I’ve collaborated before – but there was a faintly different air about the whole thing. It’s not that I was in charge, but because I’d done previous work with my advisor on this topic and we were revisiting it to an extent, I ended up in an advisory position while my advisor’s student did most of the heavy-lifting that I would normally have had to do. Similarly, I’ve just started a new side-project in my current lab in which, for reasons including a time crunch, I’ve had to similarly hand-off a large part of the work to a new Ph.D. student in the lab.

Before you say it, I know that this isn’t exactly a ground-breaking revelation. In fact, in many labs I know that it’s a regular part of daily life; there are plenty of labs out there with a much more rigid hierarchy and levels of responsibility that start at the advisor and flow downwards by seniority. But both my master’s and doctorate labs have been fairly egalitarian, and it’s not something I’ve really experienced before. Combined with the impending end of my own degree, it was an interesting bucket of cold water that reminded me that – hopefully! – I’ll be a postdoc soon, and as I continue upwards in academia, I’ll be faced with ever-growing responsibility. If things go according to plan, I’ll have students of my own to guide in the coming years, a lab to run, duties to those looking upwards like I am now. And as my supervisor said the other day: nothing in the Ph.D., which is supposed to be all about how to do science as a faculty member somewhere, actually teaches you how to do any of those things. It’s a scary feeling.

What makes the feeling even more scary is that I hardly feel like I know what I’m doing yet, myself. I can’t be graduating, surely! I’m still so ignorant! But the truth is: I’ll never know enough, and yet I have to keep moving forward. I’ve been hiding under the mantle of student for long enough – it’s time for me to get out there and make my contribution while the candle is still burning. I just hope that I can be as helpful to my future students as my advisors were to me. It’s a hell of a thing to have to live up to.

(Anyone with similar experiences is welcome to share their stories in the comments!)

A bit of my childhood…

When I was a kid, one of my favourite books was called The Wizardry Compiled, a piece of light fantasy that crossed the technology of systems programming with a world of magic and adventure.  I’d like to put into words how I felt about the books, but I find it hard.  Perhaps rather than describing my feelings, I can simply admit that it was the only book that I ever stole;  the only copy I could find was in my junior high school library, and I discovered one day that I had walked away with it in my backpack without realizing. Maybe it was an unconscious slip, though, because I always found a way to avoid taking it back until I finally just left the money for a replacement copy as an anonymous donation one day.  Not something I’m proud of, but it’s the only thing I’ve ever stolen, accidentally or not.

I was thinking of the books the other day, and hoping against hope that they had been released as Kindle e-books.  Sadly, they haven’t, but my Googling led me to the blog of the author, Rick Cook.  It seems that Rick hasn’t had a great amount of luck, health-wise, and he hasn’t been overly active in recent years.  But he’s hanging in there, and if his books ever do end up on the Kindle, I know that my wallet will be more than a few dollars lighter.  And if you still like your books in the dead tree format, I’d strongly recommend them.  So head on out there into the interwebs, or to your local dead tree retailer, and pick some up.  And wish Rick luck, while you’re at it.