Blog post by Petr Knoth and Nancy Pontika at the CORE team (Open University) and Balviar Notay (Jisc)
CORE, a global aggregator of full text open access scientific content from repositories and journals, has been growing at a fast pace over the last few months. As of May 2018, CORE has aggregated over 131 million article metadata records, 93 million abstracts, 11 million hosted and validated full texts and over 78 million direct links to research papers hosted on other websites. Our dataset of full text papers has reached 49TB. CORE is a jointly run service between the Open University and Jisc.
In an effort to see how CORE is doing in comparison to other services and initiatives in this field, we have compared our dataset with other relevant services as indicated in the table below. This shows that CORE has become the world’s largest aggregator according to several criteria. In addition, CORE is unique in its endeavour to aggregate and expose not only metadata, but also full texts of open access research papers. No other service in our list provides this capability.
See comparison table: How CORE compares – May 2018 [PDF]
The fact that CORE is the only service in the list that also hosts a large amount of open access documents makes the service particularly important to those who are interested in text and data analytics or other computational tasks on a large global collection of full texts of research papers. CORE provides access to its large collection of enriched full text content via its public API, through its data dumps and also using CORE FastSync (premium API). This means that third party services built on top of CORE content don’t need to deal with the complexity of pulling full text documents at the time of access from many different places, which is non-trivial (and often results in blocked access), slow, error prone (e.g. if resources move to a different URL) and cannot provide a guaranteed service performance. Instead, they can rely on pulling already preprocessed, and validated data from CORE using one of the three above mentioned services and they can be confident that they are having access to the widest possible amount of open access content.
One of the reasons why CORE has been able to put together such a large collection of content is that it supports a wide range of mechanisms of gathering the data. For example, while aggregators such as BASE rely on OAI-PMH harvesting, CORE can pull content using OAI-PMH, ResourceSync and using custom built connectors (some of which make use of CrossRef TDM API) to a variety of publishers as well as subject and preprint repositories.
Most of all, becoming the world’s largest aggregator could also be seen as a success indicator for CORE. Finding, aggregating and processing full text open access content is not trivial especially if this needs to be done at scale. We are proud to announce that the growth of content in CORE demonstrates that the we are able to meet CORE’s mission to “aggregate all open access articles across relevant data sources worldwide, enrich this content and provide seamless access to it through a set of data services.”