Open access briefing paper: The potential of global identifiers to support more efficient workflows for all kinds of OA

This paper was written by Chris Brown, Katie Shamash, Helen Blanchett and Balviar Notay. Is was published as part of Open Access Week 2018 and can also be downloaded as a PDF document.

Introduction

This document describes the potential of Persistent Identifier (PID) registries, in particularly for researchers and organisations, and how, if properly used, they can ease the administrative burden of any open access (OA) policy and improve workflows.

Background

Briefing paper cover page — Click image to view PDF

Research has increasingly become international as institutions look beyond national borders to work collaboratively. This has been helped by an open science agenda, which has created a more open research environment where data and outputs are shared and made more freely available. However, much of the information that supports research, through its workflows and lifecycle, remains closed or, if open, difficult to discover or access. For example, the pain points for administrators include not being able to clearly see what research is connected to particular authors and the institution or not being able to validate and resolve various versions of an article in the publication lifecycle. This is particularly onerous and problematic for providing compliance to funder policies. For authors the pain points include not being able to confidently cite a piece of research if several versions exist in different places – what version do you trust, cite and clearly understand how these are connected if DOIs (Digital Object Identifiers) are not present? Another issue for researchers is having to go to multiple systems (funder, institutional, publisher) to input/update the same key identifier information and hopefully without human error. For funders it’s not easy to clearly track research from their grant funding across various scholarly systems because of the lack of consistency in the presence of these identifiers.

To open up research, to enable and support more transparency and accountability, and to ensure that we are supporting research effectively, we must be able to access the whole research landscape. That means recognising more kinds of contributions to research and acknowledging a broader, more diverse range of career paths. In order to do so, we need tools to help us to fill in the gaps, or make better use of the tools already available to access this information. Persistent identifiers (PIDs) are helping in this endeavour and providing global solutions to the barriers preventing the efficient and effective management of research information.

Connections through Persistent Identifiers

With the creation of PIDs, and their adoption by the research community, we can improve the discovery of connections between people, ideas, organisations, funding, employment, publications, activities and more. Identifiers act as coordinates on the research map and tell us where something is located. Although there are already open, community-governed, identifier systems, which are already part of the scholarly world, the onus is on funders and governments to ensure policies are in place to encourage the use of PIDs and reduce the administrative burden for researchers and research professionals.

The following areas have been selected as examples of how PIDs can impact on and improve the efficiency of workflows, particularly around the research lifecycle.

Connecting researchers and outputs

ORCID

ORCID is an open, non-profit, community-driven effort to create and maintain a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers. ORCID is unique in its ability to reach across disciplines, research sectors and national boundaries. It is a hub that connects researchers and research through the embedding of ORCID identifiers in key workflows, such as research profile maintenance, manuscript submissions, grant applications, and patent applications.

ORCID provides two core functions:

a registry to obtain a unique identifier and manage a record of activities, and
APIs that support system-to-system communication and authentication. ORCID makes its code available under an open source licence, and will post an annual public data file under a CC0 waiver for free download.

As well as the obvious efficiency, the use of a unique identifier (as provided by ORCID) enables individuals with the same name to be uniquely identified across multiple systems integrated with the ORCID registry, giving a far more robust picture of the researchers working across different areas.

The ORCID site also provides a neutral harbour for information that a researcher can take with them when they transfer from one Research Organisation (RO) to another (i.e. particularly their publications information). The new RO’s system can draw down the relevant information from the ORCID site once the new member of staff provides their ORCID ID, thus saving the information needing to be re-entered – e.g. in order to appear on their profile page on the new RO’s website.

Increasing ORCID take up

An analysis undertaken by Jisc of Crossref metadata found that of the 14.6 million non-unique authors of journal articles published in 2017, less than 900,000 (6%) had an ORCID associated with their record. In many cases, this is because publishers do not have a workflow for capturing ORCID IDs for all authors and passing these on to Crossref: this is evidenced by the fact that some publishers pass on no ORCID IDs whatsoever, or pass on IDs for the corresponding author only. In the case of publishers who do have a workflow for passing on ORCID IDs, however, the low number of ORCID IDs is likely the result of slow take up by authors. A handful of publishers who signed an open letter ^[1] in support of ORCID and who mandated ORCID IDs for all authors have astonishing success rates: the most successful, JMIR, had ORCID IDs for all authors for 95% of its journal articles published in 2017.

To increase ORCID take up, therefore, needs a multi-pronged approach: 1) working with publishers to improve their workflow for capturing and passing on ORCID IDs (75% of new ORCID registrations occur because journals are asking authors to include their ORCID in new submissions [2]), 2) supporting institutions who are ORCID members in communication and advocacy with researchers, 3) working with institutional system vendors to embed support for ORCID, and 4) working with authors directly to increase engagement. ORCID IDs should also be authenticated (checked against ORCID’s registry) in order to prevent errors.

The take up of ORCID IDs would be improved by funders integrating its use into their funding systems, from application stage through to research outcomes collection, and in the systems for making information available for re-use. The Wellcome Trust has made ORCID a mandatory requirement on grant applications [3] and RCUK (now part of UKRI) has integrated ORCID into its own and related systems, including JeS, Gateway to Research and Research Fish[4].

In the UK, there are currently 121 member organisations and over 200,000 ORCID IDs registered ^[5]. Jisc runs the UK ORCID consortium and support service for research support staff, managers, practitioners and developers considering or already implementing ORCID. Our 2017 member survey highlighted the benefits members have seen since implementing ORCID IDs including:

“Much more accurate information on publications and research activity. We have found it a useful source of truth for checking against other sources of information”,
“Reduction in staff time spent entering publication details manually.”
“Better engagement from researchers – helped me to build relationships as I offer to import people’s publications.”
“Improved visibility of authors and outputs.”
“Better awareness of academic publishing requirements.”
“Better onboarding for new academic staff.”
“Increased repository deposit.”

PIDs for publications

Crossref’s DOIs are widely used as PIDs for publications. Crossref makes metadata about publications available on its free API; this data can in turn be used for discovery of research articles, for populating institutional repositories, for text and data mining by researchers, or more.

Crossref supports registering DOIs for preprints ^[6], extending the persistence of a DOI to preprints. The preprint DOI can be linked with the printed version of the article. This linking allows notifications at the release of different versions of the preprint, and facilitates discoverability of both the preprint and the published article it is linked to. The metadata for the preprint can include funding data, allowing funders to more easily track research outcomes even for non-published work. Since Crossref introduced support for preprints in November 2016, DOIs have been registered for 42,000 preprints, 13,000 of which have been linked to a DOI for a published work (as of 30 April 2018). Crossref’s work with preprints should be supported in order to ensure access to the scholarly record over time, enable discoverability of preprint content, and help funders to better track research outcomes.

In January 2016, Crossref proposed^[7] extending the API to allow early content registration for accepted manuscripts. The proposed workflow ^[8] would allow publishers to register a DOI at the time of acceptance. The DOI would point to a personalized landing page that displays at minimum the DOI, the acceptance date, and an ‘intent to publish’ statement, but which could also include the title, author information and ORCID, funder and grant IDs, and/or licence information. At publication, the publisher would deposit the full metadata to Crossref and the DOI would resolve to the published manuscript on the publisher’s website. The response from the community was overwhelmingly positive ^[9]. Registering DOIs at acceptance could greatly aid institutions to meet REF requirements by allowing them to pull metadata about articles and conference proceedings directly from Crossref at the point of acceptance. This would require not only on that the DOI is registered on Crossref at acceptance, but also that the publishers include in the metadata sufficient information to enable an institution to identify that a manuscript is associated with one of their authors: author names and affiliations are best for this, but ORCID IDs and grant numbers could also be used. Previous Jisc work has established that many publishers are not providing this information to Crossref even for DOIs for published work. Further work needs to be done both to encourage Crossref’s initiative for registering content at acceptance, and for encouraging publishers to provide rich enough metadata to Crossref to enable it to be used by institutions for compliance.

Other PIDs for publications are the PubMed IDs I(PMIDs) and PubMed Central IDs (PMCIDs). PubMed is an important source for discovery and access to literature in the life sciences and biomedicine. As such, PMIDs and PMCIDs should continue to be supported in institutional repositories and elsewhere.

PIDs for datasets

According to the FORCE11, “Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.” ^[10] In support of this assertion they produced a number of guiding principles for data within scholarly literature, another dataset, or any other research object. One of these principles deals with persistence and states “Unique identifiers, and metadata describing the data, and its disposition, should persist – even beyond the lifespan of the data they describe” ^[11].

A globally unique identifier allows the tracking of the impact of a particular dataset and a citation is the ideal place to provide the information needed to locate and access the dataset. Providing the link between open access papers and datasets is an important step towards creating a culture of data sharing and the transparency of research. The registration agency for research datasets is Datacite ^[12] and their DOIs help further research and assures reliable, predictable, and unambiguous access to research data. This allows data to be discovered and conform to the FAIR principles ^[13]. The DataCite service is recommended in the “European Commission’s Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020” ^[14], which states that when providing open access to publications in repositories “where possible, contributors should also be uniquely identifiable, and data uniquely attributable, through identifiers which are persistent, non-proprietary, open and interoperable (e.g. through leveraging existing sustainable initiatives such as ORCID for contributor identifiers and DataCite for data identifiers).”

A standard approach to identifying organisations

Organisation identifiers

The identification of organisations is much more fluid than that of individuals – organisations merge with each other, or split, over time, as well as often consisting of multiple legal entities underneath what might seem to be a single external identity. However, the need to adopt a solution here is vital.

The Jisc-CASRAI Organisation Identifiers working group [15] investigated the landscape and developed a number of use cases and generated a set of recommendations. The conclusion was that while one single candidate would not fulfil all the criteria, it would be useful to separate the infrastructure element (the provision and maintenance of the OrgID itself) and the service element (the services offered both to registrants and to end users of the services). The most desirable vision for the future would be for ISNI to emerge as a strong, sustainable and internationally well supported baseline or, in their own words, “bridging” ID with a few commercial players, and perhaps some non-commercial ones such as the British Library and HEFCE, acting as registration agencies and holding crosswalks or equivalence tables to their own IDs.

A review of the recommendations of the report, in 2016, concluded that they were still valid, but any solution could be recommended only if concerns are addressed in the key areas of sustainability, efficiency, support, governance and openness.

In 2016 the three organisations of ORCID, DataCite and Crossref ran collaborative workshops to discuss the issues around existing OrgID solutions. This resulted in the setting up of a working group[16] to look at the structure, principles, and technology specifications for an open, independent, non-profit organisation identifier registry. Set up in January 2017 this group ran for one year and included members from a broad range of organisations, including Jisc. The use cases from the earlier Jisc-CASRAI group were used within this group as a basis for discussion. After setting up the framing principles and governance recommendations, the working group issued a Request for Information on 9 October 2017 to solicit comment and expressions of interest from the broader research community in developing the registry. These were presented at a stakeholder meeting [17] prior to the second PIDapalooza in January 2018.

It was not until September 2018 that the “Research Organization Registry” [18] was launched. This was set up by Crossref, Datacite, Digital Science and the California Digital Library. Despite being involved in the original working group, ORCID decided to not directly participate in this initiative for reasons expressed in their blog post “Next Steps for ORCID and Organization Identifiers” [19].

While participating in the ORCID, Crossref, DataCite working group, Jisc also kept abreast of pilot work between the British Library and RCUK (UKRI) to apply ISNIs to fundable organisations. The grant application system (Je-S) has problems with duplications, mergers, de-mergers and keeping up to date with the changing world of organisations. The use of UKPRN [20] has limitations as a national identifier, but with planning for a new grants system there is an opportunity to make it use a new PID solution, if available.

Following discussions with Jisc and the British Library, the RCUK (UKRI) defined a number of use cases for implementing an organisation identifier solution and the services required to support it:

An organisational ID that helps us keep our information up to date and reduce duplication in order to improve the quality of our internal reporting. We would require a tool to upload organisation data which could transform it into the specified format and that makes it easy to keep our information up to date.
An automated look-up in our grants system to populate organisational IDs and reduce user error that minimises manual effort and is fast and easy to use.
An ID that is publicly out there and that others are using so that we can link systems and data across the research information landscape in order to inform strategy, investment and operational decisions.

The optimal workflow for a grants service would be that an external user setting up a new account would access an easy look-up for their organisation that would auto-populate an organisational ID. In the background there might need to be tables of comparison drawing down different organisational IDs from different sources but that should be invisible to the user. The look-up tables should be populated through an API link to the source data rather than a static file to ensure that the information is kept up to date. Where an organisational ID doesn’t currently exist then it should be generated automatically at the time of use. It would not be acceptable for this to be a manual process that takes several days for a reply. In selecting an international, bridging identifier as a solution, it has the potential for improving the processes and workflows around the research lifecycle, particularly when it comes to satisfying open access requirements.

In August 2017, there was an announcement [21] from ISNI that they were making “changes to its infrastructure focused on providing open identifiers for organizations working in the field of scholarly communications.” This involved segmenting the organisation records from the ISNI database, making the identifiers and associated core metadata available under a CC0 licence and an API for the retrieval and resolution of ISNI IDs. These were issues that had been raised by ORCID/DataCite/Crossref prior to the setting up of their working group. The ISNI announcement stated that a new Advisory Board would be set up with representatives of the scholarly communications community to guide the efforts of the “ISNI Organizations Registry”.

Jisc met with UKRI and the British Library, in September 2018, to discuss the potential of further work following the original pilot and plans for the “ISNI Organizations Registry”. The Advisory Board for the registry is currently in the process of being set up.

Prior to the announcement of the “Research Organizations Registry” and the meeting with BL and UKRI, Jisc published a blog post [22] summarising past initiatives and highlighting some of the issues that need to be addressed by any registry. There are two potential registries looking to solve the organisation identifier problem but, for any registry, the recommendations from the Jisc-CASRAI group still apply. In addition to these, it’s worth stressing that the following are addressed for any registry to be a success and adopted by the community. These are:

The ownership and governance of an organisation identifier registry in the academic domain should reflect the range of organisations that would be identified in the registry. It is not appropriate for a body governed by only one set of organisations to have excessive influence.
There must be trust in the process for defining governance and host arrangements with an open and transparent process.

Without addressing these issues it’s unlikely that the community will trust or adopt a registry solution for organisation identifiers. Jisc will continue to engage with the relevant groups and organisations to ensure the needs of the UK (and international) community are met and that these issues are raised.

Funder Registry

Crossref maintains a Funder Registry [23]. The registry assigns a DOI as a unique identifier to each funder. The list is searchable online or via an API, and is updated and reviewed on an ongoing basis. Funder DOIs, along with funder names and grant numbers, can be associated with the metadata for a publication in Crossref. The data is fully openly available under a CC0 licence.

The Funder Registry meets the criteria set out above and is a useful organisational identifier for funders. It allows funders to more easily track the outcomes of their funding, research organisations to monitor their academics’ output, and the public to understand how funds are used. Open funding data can also aid compliance by making it easier to associate a published work with a particular funder policy.