What is a repository?

“What is a repository” was one of the first questions I faced on joining Jisc back in 2005 and, of course, it was not the first time it had been asked even then, and it has come up periodically since then. With many OA (and some open data) policies explicitly mentioning “repositories”, it might be timely to revisit this question to see whether we have a consensus on an answer or, if not, then where the major points of discussion are. To help explore this question, I have drafted a short (three page) document that I hope will lay out some of the things that might be considered in answering the question. I’d be very interested in your reactions to this thought piece.

Here it is:

What is a repository May 2016

By Neil Jacobs

JISC Programme Director, Digital Infrastructure (Information Environment)

7 replies on “What is a repository?”

The basic definition of a repository, used in a document I created for the purpose of certification of repositories in Hungary contains the following criteria:

– provides Open Access
– contains full text doocuments
– support for OAI-PMH
– stated goal for sustainability of service and preservation for long term (or at least mid-term)

The document later on tackles other desirable features, similar to other criteria collected here.

Andras Holl
Library and Information Centre, Hungarian Academy of Sciences

Thanks Andras, those are good features, but i wonder if we want to tie the definition to a particular technical standard, ie OAI-PMH?

Three thoughts:

1. To what extend does/should an OA IR aim to provide OA? My experience is that often when trying to access papers/documents in an OA IR I am hit with a login wall, and this does not always appear to be due to a publisher embargo.

2. To what extent should an OA IR aim to host the content itself, or is it sufficient to point to a paper/document hosted elsewhere (i.e. on the publisher’s site)?

3. Para. 4 states “there is author control over any removal or alteration to their work” What about author control over deposit? Should this not at least be addressed?

Richard Poynder

thanks Richard, interesting questions. My initial reactions would be:
1. That shouldn’t happen in an OA repository
2. Many repositories hold metadata-only records. I wonder whether a consensus would form around the idea that an OA repository can do that if the record points to an OA copy somewhere else?
3. Yes, thank you for pointing that out.

On point 2, I’d be cautious about the idea of saying a repository should rely on OA material hosted offsite. While it’s reasonable to say that the majority of OA papers will remain accessible, there is a very long tail of small OA journals with less reliable infrastructure. A recent study of pre-2002 OA titles found that ~40% were closed, and a further ~10% completely vanished – no digital copies of the material were available online. A repository which had included pointers to these titles, rather than copies of the paper, would find it now had nothing beyond the metadata record.

The growth in repositories may help offset this problem in future – there’s a lot of repositories that now hold the back issues of a small locally-produced journal – but it’s still not a great solution. If the repository is aiming to hold a digital archive of the institution’s output, then mirroring OA material where possible seems the best way to go, even if it feels redundant for things like PLOS papers.

One complicating factor is “unlicensed OA” – material that is free to read on the publisher’s website but not obviously permitted for posting on repositories, etc. This is still pretty common with smaller titles – ie, the more vulnerable ones – and I’m not really sure how we should best deal with it. Keep a restricted-access copy locally in case of problems, and point people towards the offsite one? Feels a bit convoluted…

In conversations, a couple of additional criteria for a repository have been suggested, which i think are worth considering:

a) The repository should have publicly accessible APIs so that others could programmatically access the content and re-purpose it

b) All content should have item-specific licenses, expressed in a machine readable way.

While (a) would seem unproblematic (OAI-PMH is an API), (b) is more tricky. However, note that the RIOXX metadata profile does require a license for each record. While it is not straightforward in many cases to populate this field, it is surely where we would want to get to?

Just a couple of points to add in response. Readers may already be aware of the summary by Nancy Pontika on DNA Digest ‘What Open Access is and what it is not’ (16 Feb 2016) which flags up some differences between OA repositories and companies such as, ResearchGate and Mendeley, and advocates that researchers “should not limit themselves to these sites only, but deposit their papers primarily in institutional and subject repositories that conform to the OA concept, which is inclusive to all and not exclusive to those who have an account with the service or own special software”

Re the principles for open scholarly infrastructure discussed in your paper, for instance around ‘mission’ – institutions may be looking to their repositories (and possibly CRIS’s, which may also facilitate OA) to provide more information in terms of metrics or information specific to their institution, which it is not always possible to obtain easily from other databases or repositories.

They are a place where initiatives around reuse and retransmission of research and metadata could be implemented to help many stakeholders – clearer licences for publisher versions and author accepted manuscripts & the expression of them being a case in point, ORCiD another. Dissemination & reuse in an online environment depends on standards to be agreed and widely in use, and the openness of the infrastructure could be seen as equally important as the availability/persistence of the content, to allow for new possibilities.

Jennifer Smith

Leave a Reply

Your email address will not be published. Required fields are marked *