A schema for OA policies

There are now many OA policies, from research funders and universities, listed in Sherpa/Juliet, ROARMAP and MELIBEA. This can lead to some confusion, especially for an author who is subject to more than one, neatly illustrated in a slide used by John Norman of Cambridge at an ALPSP seminar earlier this year (available here, PDF, slide 8). While potential alignment of policies is an ambition of the EC PASTEUR4OA project, and there are specific calls for some alignment between RCUK and REF OA policies in the UK, a first step might be simply to have policies expressed in a relatively consistent way.

It turns out that this is not as straightforward as it might sound. A group of us, including Alma Swan, Stevan Harnad, Bill Hubbard, Mafalda Picarra and myself, have been working on a draft schema for a while now. It remains a draft, and we are very interested in feedback on it. Proposed schema for OA policies 20141117 [MS Word]. There is a balance to be struck, between precise description, complexity of expression, and difficulty in actually using the schema. While the draft is quite long, we think that – on the basis of an analysis of a range of real OA policies – it needs to be long, to avoid too much ambiguity. And, of course, the schema would only need to be filled out when a policy were issued or revised, which we hope would not be too often.

Anyway, we are now asking for feedback, both via comments on this blog post, and more directly in some cases. We hope, at the very least, that the schema will provide a framework for a systematic and informed debate on where and why policies differ.

By Neil Jacobs

JISC Programme Director, Digital Infrastructure (Information Environment)

13 replies on “A schema for OA policies”

This schema is far too complex, and yet omits very important information. What’s missing? This schema asks about institutional open access funds. It appears to assume that the UK approach of paying APCs is how agencies would fund open access. Outside the UK, however, direct and indirect subsidy of publication per se (either editorial support through journal subsidies or indirect technical support through library or university presses). Another important but missing element is the basics of the faculty permissions (Harvard, MIT) policies, i.e. questions about whether the policy involves transmission of nonexclusive sharing rights to a repository.

By asking for so much detail, this schema is likely to get wrong answers or people trying their best to give correct answers with “it depends” being the most accurate to many of the questions. For example, one of the questions is “maximum allowable publisher embargo”. This is likely to depend on the discipline (sciences shorter than humanities and social sciences) and/or the type of publication (definite embargo periods for journals, more flexibility with respect to theses, books or book chapters, for example).

Another concern I’d like to raise is whether asking such specific questions, if people take this seriously and aim to answer the questions, could have the unfortunate effect of tending to freeze an area that is in a state of evolution. For example, I would hope that the OA embargo maximum periods would shorten over time, but I could see where having to fill out this form more than once would discourage anyone from reviewing their OA policies.

Thanks very much to everyone who is doing this work and wish I could be more supportive and less critical. I understand that people want the simplicity of having this information filled out, but sometimes by trying to specify everything we actually end up creating artificial and unnecessary limits that inhibit the kinds of changes we really want to see. My suggestion is to stick with a limited amount of information and a link to the policy per se. I think encouragement to actually get the policies on the web would be particularly useful.

Thanks Heather, some really good points here. I’ll respond personally below; I expect others involved in the schema drafting might want to add further comments.

On the complexity generally, please remember that this is a schema. It is perfectly possible, and I hope likely, that enterprising people will be able to build tools to simplify providing the information, so this schema would sit behind those tools without necessarily being seen except by those analysing the declared policies. But even so, there is probably room for some simplification, for example, the focus of most policies is on articles and conference proceedings, so perhaps the schema should focus on those too, and seek to be less comprehensive.

On maximum allowable embargo, yes it is likely to depend on the discipline area or other factors. The way we felt we had to deal with that was to see those differences as essentially different policies. So, if your policy says a 12 month embargo is the maximum for STEM, but 24 months is the maximum for humanities and perhaps social sciences, then effectively you have different policies for those two areas. It might be possible to simplify how that information is filled in, as noted above.

On the possibility of that the effort of filling in this schema might put organisations off revising their policies, for example reducing allowable embargoes over time; I would hope that such decisions are made at a sufficiently senior level within an organisation that any additional effort properly to document those revisions would be a minor consideration?

I think we would certainly take your point about the need to represent alternative journal support mechanisms, outside APCs. The challenge then is how to represent a policy that offered such support, without further adding to the complexity of the schema. And, yes, Harvard-style policies need to be expressible as well, so I think we’ll need to think more about that.

Once again, thanks for your very helpful feedback

I call this an “essential variable analysis” and have studied the logic thereof in the context of measuring complexity. I first got on to it in the 1980’s looking at the design of consumer banking products.

Consider this. If there are 36 variables, with just three variations allowed for each variable, then the number of possible designs (the combinations) is 3 to the 36th power, or roughly 15 million billion. With just 10 variables there are still around 60,000 design options. 

The OA schema appears to fall somewhere between. I have not yet counted the variables and variations, but it can be done. I doubt that what they have is complete.

Clearly confusion due to complexity is inevitable in such cases. The impressive thing is that we muddle through pretty well, as this sort of complexity is fairly common. (I think it is best to understand the situation, but some do not agree, so I am accused of negativity.)

Some things follow from these big numbers. There is no way to consider all the possibilities. (Herb Simon called this “bounded rationality,” a term I dislike.) There is probably no way to optimize or pick the best. Having an organization might help. Etc.

Fun stuff, this.


The multiplicity of policy components simply reflects the diversity of existing policies. The purpose is to make the OA policies of funders and institutions comparable and to make it easy to update their components to reflect what empirical evidence shows to be effective or ineffective in generating compliance and OA.

Hi there

Is it possible you can explain in a bit more detail the use case(s) for the schema? For example, if lots of institutions and funders spent time compiling this information, where would it reside? Is the plan to use it in SHERPA FACT – if so, is there any corresponding schema for publishers?

Thanks very much

J. Smith

Eventually, yes, services such as Sherpa-FACT would be able to use this information to support operational workflows, providing better information for example to guide those implementing OA policies. This is especially true when we consider that many authors will need to refer to more than one OA policy with respect to any particular publication – they might have multiple funders, or a single funder but work in an institution with an OA policy. I reference John Norman’s slide in the post, which neatly summarises the degree of complexity that soon emerges for the author in such circumstances. So, the schema might be a way to shift part of the burden from the author to the organisation issuing the policy and, perhaps in time, make it easier for policies to become more aligned, if those issuing the policies wish to do that.
And publishers? Well, through the Sherpa-FACT Advisory Group, which has representatives from the major UK research funders, from institutions and publishers, there is a very active discussion on this very topic. I realise that is rather UK-specific, but Sherpa-FACT perhaps shows most clearly why this is an important question.

As you can see from others’ comments, there’s some confusion over the intended use of the schema. It might be helpful to add a column to the Word document in which, for each field, you explain what use you would make of the field. This would both make intended uses clearer and lead you to realize if you’ve added any fields that won’t serve any particular purpose. For example, I’m wondering who would use field 5 (“Country or region in which agency is based”). You probably know this, but I’ll say it anyway: when designing an ontology, it’s tempting to create a way to document everything that comes to mind, but that path leads to madness.

The two primary purposes of the OA policy template are: (1) to harmonize OA policies across institutions and funders worldwide, to make it easy to compare them as well as to integrate them; (2) to optimize OA policies across institutions and funders worldwide, so that as empirical studies reveal which policy components are effective or ineffective, it will make it easy for institutions and funders worldwide to do evidence-based updates of their OA policies.

– 15, 32) research data should be added as a possible item covered by the policy
– 15, 32) should be mandatory
– 31, 34, 38) should be mandatory. If these points are not addressed then the crucial question whether formal OA publication is actually required remains quite unclear. ‘Not specified’ can still be selected, but the schema should not invite to skip those points.
– 38) should be mandatory. ‘Not specified’ can still be selected, but the schema should not invite to skip this important point.

Thanks Frank,
We had a discussion about whether to include research data but decided against it, as there are just too many differences in what a research data policy would say (eg, the need for data management plans of particular kinds, the relevance of privacy, etc, to the release of some data).
On making more fields mandatory, i agree that the fields you’ve identified should be filled in if they are relevant to the policy being described. However, unless the policy actually covers deposit or OA publishing, then the fields you suggest are not relevant, and so they cannot really be mandatory in all cases.

Optimizing and Harmonizing Institutional, Funder and National OA Policies

The various different components are modular, and most of them can be required or recommended. A component is not intrinsically mandatory (required): that is one of the parameters of the policy.

But once all the components and parameters of all institutional and funder policies have been made explicit in a standardized, comparable way, institutions and funders are then in a position (1) to optimize their policies, based on the comparative empirical evidence about which components are effective or ineffective in generating compliance and OA and (2) to integrate their policies with one another, to make them more interoperable (especially funder and institutional policies)

On a general note – do you have any completed examples to see if this approach is useful? I concur with the suggestion about showing how it is to be used. I have just finished working on SCAPE (which was concerned with digital preservation) and in that I was involved in creating guidance for organisations to create preservation policy (deliverable: – warning this runs to over 100 pages!) and one of the important approaches in this was to attempt to explain why it was important to consider making policy on the topic concerned which helps to set it in context. I realise that the schema is doing something slightly different, but I think the thinking behind it is similiar to the SCAPE policy aims.
Here are some specific comments about particular fields within the schema – I hope you find them helpful:
Section 1, item 3: Type of agency: I’m not sure if you are looking for one word values – but I really don’t think using “institution” for research performing institution is terribly helpful as it is is not clear without the note. I also don’t like “other” without the option of describing what other is, as it is meaningless and just a method of ensuring the field can be made mandatory.
Section 2, item 12: the notes says “indicate to whom within the agency….” for funders the policy may apply to people who are not part of the agency issuing the policy, so this advice needs rewording.
Section 2, item 16: refers to item types in field 14 – I think this is now field 15
Section 2 item 18: I would separate funder from subject repository – these are not synomous terms
Section 5: If this policy schema were to be widely adopted would you analyse what is in this section to enable further specific additions to the schema?

Leave a Reply

Your email address will not be published. Required fields are marked *