Open Access - my comment on the federal OSTP blog

The US Office of Science and Technology Policy has been hosting a discussion on federal open-access publishing policy on their blog. A lot of interesting commentary, although a little overly dominated by Stevan Harnad up to now (but this is exactly up his alley). I think the choices that have been largely looked at though, so far, are too narrow in scope - basically doing something like a big "arxiv.org" or PubMed project on the one hand, or fostering institutional repositories on the other, or some sort of combination. What we really want is not just opening future research, but also expanding access to the vast body of existing research articles - and not just for federally funded US research, but as far as possible, for all of it.

I also felt it reflects on the larger questions surrounding the integrity of science that have been in the air lately. The following was what I came up with as a comment - not entirely completely thought out yet, but I think the idea of directly involving existing publishers and secondary publishers as well as opening up competitive avenues for new entities of that sort is the direction we really need to head in to make this work as scientists would like.

-------

First, by way of disclosure, I work in the IT department of a scholarly publisher, and have participated in these debates on open access off and on for roughly 15 years. My thoughts here are my own as a citizen and former scientist, and in no way intended to represent the interests of my employer.

In a world awash with information, peer-reviewed scientific articles provide a bastion of rational evidence-based thinking. Individual articles can be critically important building blocks for understanding past and present, for predicting the future; they are the bedrock of the edifice of science that tells us as far as possible what “is”, and help guide decision making on what we “ought” to do to make things better, whether at the personal, corporate, or national level.

They are also under threat. No, not from “open access”. Open access may be a very good thing, but the important question is whether it helps the real battle: over the integrity and trustworthiness of science itself. This is not a new battle, but as with many extended wars, with time has come greater and greater capability on the opposing side, and it is critically important for science to build sufficient defenses against attacks that have already occurred, and those which can be anticipated to come. Examples of this new order of battle are quite widespread (and some several decades old):

* Tobacco industry manipulation of research on the effects of smoking
* Pharmaceutical company sponsorship of distorted research
* Other major corporation-sponsored or ideological attacks on environmental and health-related science
* The creation of entire apparently peer-reviewed journals whose purpose was advertising for a particular company’s products
* A plethora of small journals with extremely dubious review practices - for example the “Open Information Science Journal” case.
* Even relatively established journals somehow publishing “crank” papers on such things as creationism or denying the greenhouse effect in its entirety.
* Many much smaller examples of ideology, corruption, or simply pressure from uncomprehending media or public, apparently allowing papers with dubious conclusions into the literature, or suppressing papers with contrary conclusions
* And all this is happening while a very large fraction of professional science journalists have lost their jobs with major newspapers and media companies in the last few years.

The questions raised here by OSTP regarding open access may help: exposure to sunlight is often cleansing. But there is also a danger: with so much information freely available, how do we avoid Borges’ “Library of Babel”, where you simply cannot tell what information is reliable - when the information sources you regard as stamps of authority may themselves have become corrupted in some way? Who do we, finally, trust?

Barring conspiracy theories, what seems to prove most trustworthy in the world at large is having a large number of independent entities provide, or even better compete over, essentially duplicated information sources. We must absolutely avoid single points of failure in the system, where any one entity could (or could be accused of acting to) distort this fundamental bedrock of science. Mirrors of a single site are not enough. Even settling on a single standard for article content or data storage may not be a good idea, if that standard itself can be controlled or attacked (for example with software viruses). Diversity is important.

There remains considerable diversity in the current peer-reviewed journal system, though some of that has been reduced in recent years through corporate mergers and consolidations; the journals we have now do compete for researchers’ attention based on the quality of what they publish, and that is an important factor in maintaining their integrity.

With that context out of the way - the first question in the OSTP Federal Register article is on how this might change under an open access policy. To the extent a unified open access service becomes the primary means for researchers to find scientific articles, it would inevitably reduce the association of articles with the journals they are published in, and this in turn would reduce that competitive incentive that ensures quality in the journals. Furthermore, if authors become less interested in getting their articles into the highest-quality journals, the quality of the articles themselves will suffer, again from a loss of competitive motivation on the part of the scientists this time.

Rather than a giant bin of undifferentiated (except by subject) articles, as the current examples of open-access archives seem to be, a more structured collection that fosters real competition - among journals and/or among the authors themselves - could resolve this problem. With the ongoing revolutions in social networking software there are surely ways to harness human motivations to make the best articles, and journals, stand out clearly from the rest.

Moreover, rather than having the federal government sponsor a single archive of this sort, it would almost certainly be better to encourage multiple independent entities to compete in creating such archives, with largely duplicated information from the primary publishers. To some extent this is what secondary publishers do now - but their information is proprietary and not public. The issue then becomes one of funding models: perhaps we should require institutions receiving federal funds for research to include payment for the secondary publisher of their choice in overhead costs, but then require all the secondary publishers to make at least a minimal collection of their data available freely. Regulation of this sort over secondary publishers might be a very quick way to open all the literature while preserving the competitive incentives that ensure quality.

On question 3 from the Register article, the benefits of opening the literature have already been well discussed here, and I fully agree - having more people able to read the best scientific papers and understand current issues in science is surely a good thing. The resulting innovation and better decision-making all around will be a great benefit to the country.

On question 5 - to ensure compliance you will need to recruit either primary or secondary publishers somehow or other. On question 6 - for the integrity of the literature there needs to be only one version clearly identified as the “final published” one, with a clear mechanism for handling later corrections or “errata”. It may be fine to include other versions and supplementary data etc. along with it. Perhaps we’ll figure out new modes of communication that parse scientific articles into their logical statements and claims and those will be the more fundamental entities of interest, but for now the “final published” article needs to be a fixed immutable object essentially in perpetuity.

On question 7 - if funding etc. were not an issue, there should be absolutely no delay between publication and having the article open. A scientific article is almost always of greatest interest when it is news, the day it is published. Other work can then build on it. Delayed by months or years, and that other work is itself delayed, building unnecessary friction into innovation.

On question 9 - usability is indeed key, and is essentially the guarantor of quality I discussed in addressing question 1. Community participation, as suggested in the question, will also be important. That will require intelligent moderation to avoid wasting legitimate researchers time with crank/troll activity. All of these are things that need to be paid for.

Open access will cost the taxpayer money one way or another. But it will be worth it, especially if it can help address the central issues in the larger battle over the integrity of science itself.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

And they have a new

And they have a new discussion up now, on standards. My comments on that:

One of the banes of scholarly publishing is the plethora of document format standards for text, images, and other content, as well as for metadata, that all essentially provide the same functionality. That makes interoperability of the various respositories much more complicated, with computationally intensive, lossy, and failure-prone format conversions requiring significant investments of human effort to do much useful with all that textual and other data. By setting at least a preferred standard, a federal repository requirement could help bring about a convergence and simplification that would in itself be of great benefit.

Various other comments here have mentioned XML as a natural standard - and that makes sense, but just requiring an XML format is not sufficient; more or less standard document structure components should also be specified. Open Document Format (ODF) is one option, already widely supported in word processing software. SVG is a natural format for most images (an XML flavor that handles vector graphics). But the best solution may be to use what seems to be becoming the standard for electronic books: "EPUB":

http://www.idpf.org/

One important consideration should be accessibility to the visually impaired - DAISY has a close association with EPUB, though I'm not entirely clear on the relationship (EPUB seems to include some DAISY specifications) - more info on that here:

http://www.daisy.org/

As far as metadata goes, important standards are DOI (http://doi.org/) for citations to published literature, and I'd suggest standardizing on the relatively new ISNI for author-identification:

http://www.isni.org/

This may encompass geographic identification, or you may want to specify national and sub-national components of origin separately. Subject classification is another important area - NLM (PubMed) has standard keywords for their subject areas, and perhaps that should be combined with keyword classification schemes from the physical sciences, economics, etc. to produce a sufficiently broad and flexible system. Or that may be something for independent entities like the existing secondary publishers to work on as added value they can provide.

In general, federal standard-setting in scholarly publishing would be a very good thing, and I hope you will consider some of these as recommendations for the planned repository(ies).