Science Commons - Openness and Sharing, but what about Trust?

In a fascinating interview preliminary to the upcoming ETech conference, John Wilbanks of Science Commons expounds on some of his ideas on the future of scientific information. Science Commons is an offshoot of Larry Lessig's Creative Commons, where Wilbanks works (as VP for Science). As such, part of their focus is on the legal barriers (copyright in particular) to sharing of scientific information - both publications and data. They are big fans of the existing open-access journals like Public Library of Science, but they have a much grander vision than that. Some of my favorite quotes from the interview follow.

Wilbanks on why science will naturally move in this direction:

[...] we think that the innate nature of science, which is publishing, which is community-based, which is about sharing information and remixing information, those norms are going to take over if we can simply get the resistance out of the way.

So what we are trying to do is to intervene in places where legal tools, technical tools, policy tools can lower those barriers that are currently preventing the real emergence of the traditional scholarly norms on the internet.

Wilbanks on the scientific article as a "compressed" version or "advertisement for the actual science:

I think there's a little bit of a Guild mentality, you know, in terms of the language and structure and flow of these papers. It's taken me some time to learn how to read them. And it's artificially idealized I think. Because you're trying to present what happened in the lab one day as some fundamental truth. And the reality is much more ambiguous. It's much more vague. But this is an artifact of the pre-network world.

There was no other way to communicate this kind of knowledge other than to compress it. Now when we think about compression, we think about zip algorithms. But in the old days, compression meant writing it down on physical paper and mailing it to someone. That was the only way to condense and convey the knowledge in anything representing an effective way. But the problem is, we've digitized that artifact of the analog world. And so the PDF is basically a digital version of paper with all of the attendant technical benefits.

[...] if we think about the article as simply a token for several years of research, which include data, which include lab notebooks, which include research materials, software. All of these things that are put into the advertisement that is the paper. Those things can be really very powerful on the Internet.

On the natural resistance to change of the existing science system:

[...] the computer enables all of these things, but the scientific practices are so stable that they're really resisting change. And so I think of it as the way that we communicate scientifically has evolved as this formal system. And like most good systems, it's stable against disruption. And it's stable against bad disruption. But it's also stable against good disruption.

On cultural resistance to innovation in science communication:

it depresses me that we have so much more innovative programming researchers going into Facebook than we do into clinical trial data. And into the life sciences. But in many ways, that's a function of the culture, of trying to attach value to every datum and every article, financial value. Instead of thinking about the collective value we could get if that stuff was a common resource.

On working with existing publishers and (especially) scientific societies:

This is no less fundamental of a change than the change from the mainframe to the microcomputer in terms of models for these publishers.

And so it's very scary. And I think we have to be open and honest and accept that as a valid emotion, and try to work with the existing publishers and especially with the scholarly societies who don't have the million dollars to invest in R&D on scientific publishing that a big publisher might have. These are operating to the bone. And so we need to work with them, and help them find ways to make the transition in a way that doesn't destroy them at their core.

[...] We want robust publishing houses inside scholar societies. But we want to move that into a direction that allows this sort of remix and mash up of information in science. And so we just have to help them find models, publishing models, business models, legal models that help them make that transition and be part of the solution with them.

On large datasets and the relationship to taxonomy:

[...] the communities involved have got to come to some agreement on meaning. And by meaning, I mean sort of standard names for things and relationships between things. Ontologies. Hierarchies. Taxonomies.

Things like data models for the SQL database but at a global web scale. Because in the absence of those, these are just piles of numbers and letters. And that's hard to do. It's really hard if you expect there to ever be sort of a final agreement on that list of names and relationships. So we've got to find both technical ways and social ways to have lots of different points of view represented and evolving and integrating.

and back to their efforts to change things:

[...we are] trying to get to a point where scientists see the value of sharing and indeed, believe that they can out compete other scientists if they share. But that's the biggest challenge because you've got to do so many things simultaneously. You've got to deal with legal problems, both contract and intellectual property problems. You've got to deal with incentive problems. You've got to deal with workload and labor problems. You've got to deal with the Guild culture and the Guild communication systems, all of that at once. And that's really hard. So getting through this collective action problem is probably the hardest thing we've had to do from the beginning and will continue to be the hardest thing we have to do.

They have a pretty impressive vision of federated science data with universal access. Wilbanks gets into some of the issues of privacy and national security concerns in the interview, but I think those are side issues to the really fundamental question: how do you ensure the data, the articles and reports, is trustworthy? Perhaps ensure is too strong a word - peer review is no guarantee of that now. But it does provide a first-cut rating of the trust-worthiness of a piece of information. And it also provides a means of roughly rating importance, clarity of presentation, and those other aspects that go into the qualities a particular journal looks for in its articles. There needs to be a whole regulatory system surrounding this open database - the taxonomy or name-space needs to be reliable in some fashion, for instance. How to build that? Who?