Libraries and Open Civic Data

Last year, Ed Summers wrote a post called “Inside Out Library” about how rather than trying to pull in data from all over the world to present to local users, libraries should be finding and making accessible local data, both for use by their local community as well as the rest world. Rather than compete with Google, libraries should focus on their strength, authentic connection with and knowledge of local users and communities.

I’ve been thinking about this idea for a while, and one idea that seems obvious to me, but which I don’t see many libraries taking up, is getting involved with open civic data efforts. Cities, counties, states and even countries around the world are (slowly) starting to the embrace the idea of publishing budget data, crime data, land-holding data, and multitude of other datasets in publicly accessible and consumable ways.  Governments see open data as an excellent opportunity to promote growth and commerce, spur entrepreneurship and innovation, and to meet transparency objectives.

Now, for all the best in intentions of providing transparency by publishing this data, many things remain murky. Some datasets are published in their raw, system specific format making them hard to understand for those not already initiated in the workings of that civic body. Trying to make connections across datasets is difficult at best, as labels vary wildly between datasets, let alone between different agencies/departments within the same civic jurisdiction. This is only meant as the mildest of rebukes for these civic data publishers. For civic bodies with already stretched budgets, getting data out is hard enough; getting it into some (as of yet non-existent standard) is a Sisyphean task.

So where can libraries fit into this? Where they always have, promoting information literacy (civic information in this case) for the community, providing stewardship for information’s discovery and long term preservation.  And maybe new places, offering advice to civic bodies about ways to organize data so it can be easily discovered by users, advocating for the use of data standards across datasets, and maybe teaching users the technical skill sets required to understand and transform the civic data into information.

I think their a lots of possibilities  for libraries to get involved here.  Just a few of the top of my head:

  • Run a class on how to pull the civic datasets into Excel and manipulate
  • Take the next step and teach user the basics of OpenRefine to extract the data they need.
  • Create documentation about the datasets showing where they overlap, guides explaining  the types of data,  and other “dejargonification” to make it more comprehensible for the public.

There is already a small but growing movement of civic hackers, data journalists, and non-profit organizations working  realize the benefits of open civic data, and libraries can certainly make a contribution to that effort.

Learning by Breaking : What Hieddegger taught me about Git

During my undergrad philosophy degree, I was particularly taken with Being and Time by Martin Heidegger. My favorite section (perhaps the only part I understood)  is where he sets out a very simple but effective analysis of the way we interact with the tools we use on a daily basis. His example can be summarized as something like  this:

“I have a hammer. I never really contemplate the hammer, I just pick it up when I need it, use it, then put it back when it is done. It is the means to accomplish my end. It pushes down nails where they need to go. However, when that hammer breaks, and I can no longer complete the task at hand, then I start to really think about the hammer.”

In those terms, I had a Heideggerian moment with Git earlier this week. Like most developers, I use git (or some other version control) on a daily basis in my work. Once I learned the basic commands I needed for my workflow, I ceased thinking about Git and just used its commands. I push, pull, add, merge and commit on autopilot most of the time. I always knew there was a lot more to learn, but…you know…deadlines.

However, earlier this week I ran into a problem with two branches that had diverged around some code that had now been moved into a submodule. I won’t go into details, but the important point is that my usual workflow broke. Git was throwing errors and bugs. I could not merge in my changes, or even switch to some older feature branches. In short, I could no longer do my job.

I was forced to explore how Git actually works. How was it handling directory structures when files were deleted? What the hell were submodules actually doing? What does that cherry-pick thing do again? It took half a day’s work to figure out, but I finally cleaned up my branches and got back to a working state. The best part is, I actually understand (a bit more about) what Git is doing with my code, and how to interact with Git to make my life easier.

*

This little story highlights two points that I feel were crucial in my conversion from a non-technical humanities undergraduate to web developer. The first is Heidegger’s lesson, that in everyday life we don’t actually know everything about the tools we use, and yet they are useful anyway. You don’t have to know all the ins and outs of Git to use it, despite what snarky responses to questions on Stack Overflow or mailing lists can imply. A big part of why Git is such a good tool is that it  does the job you need it to do, and (mostly) just gets out of the way. That is a hallmark of a great developer tool: it allows you to be more productive very quickly with a minimal initial learning curve.

Now, without doubt, when you use a tool in your everyday workflow but only have a shallow knowledge of that tool, you are working on borrowed time. The tool will inevitably “break” at some point and you’ll have put aside your own task to consider the tool itself. That break point, however, is a crucial part of the learning process, and my second point.

Some of the most fruitful learning experiences happen when stuff breaks down (unless it’s production stuff). You are forced to dig below the surface to see why the nicely packaged tool you have been using has suddenly stopped serving your needs. It gives you the opportunity to understand what is going on under the hood, and in so doing, you learn new ways that tool can be used.

From my own experience, I’m actually pretty convinced that this break point isn’t just a happy accident, but an essential part of the learning process. At some point I’ve busted all of the software I now use on a daily basis. That break, while frustrating at the time, was ultimately illuminating. Putting in the time to fix those tools is the reason I now know them so well.   And, as I move more and more towards a test driven development style, it becomes clearer and clearer that this applies to my own code as well as other people’s.  It turns out breaking my own code is sometimes the best way to figure out what I’m actually doing.