Steve Byford writes about this new plug-in.
A new plug-in for EPrints is now available to help you spot, weed out or merge duplicate entries on your EPrints repository. The functionality it provides will make it much easier to capture the best information out of repeat entries that really describe the same article, and to safely discard any that are redundant. The new plug-in is now available on the EPrints Bazaar.
To duplicate is human, to de-dup is hard
It’s all too easy to end up with multiple instances of the same article on your institutional repository. Co-authors of the same article may each have uploaded it separately, and you may have received notifications and full text of the same articles from services like Jisc’s Publications Router, or duplicate metadata from other sources. How can you spot that this has happened? And how can you tell if two entries are merely similar but genuinely about different articles?
EPrints already offered some rudimentary functionality to try to deal with this, within the “search issues” section of its administration area, but this is not often used: it’s not well known, and can be difficult to use, requiring an understanding of relatively obscure search syntax.
Finding out what users really need
To work out what functionality repository managers and administrators would find helpful, Jisc commissioned Key Perspectives to do some market research. They worked with staff from a range of institutions that are known to Jisc because they receive content from Publications Router, and so also have experience of capturing content by more than one method.
This helped identify how best to improve upon the “search issues” function, and improve the user interface to make it more intuitive.
Delivering better functionality
Guided by these insights and based on their detailed knowledge of the software, EPrints Services’ developers at Southampton produced a solution that delivered the functionality that users had said would be helpful. The resulting Jisc-funded plug-in offers the following features:
- It provides a straightforward user interface within the administrator’s area, offering simple and straightforward searching for similar entries across a number of metadata fields, most importantly the article’s title and its DOI.
- Searches then result in a list of possible duplicate records. From there, you can open a pop-up box from which you can quickly amend or retire a duplicate record.
- You can view summaries of two records side-by-side in a pop-up window to compare any fields that differ between them – and decide which of them you wish to retain or discard.
- If you decide that apparent possible duplicates are actually genuinely different, you can flag them as such so that they don’t reappear every time you run further searches.
Installing the plug-in
You can find out more about the plug-in, including technical information and screenshots of its user interface, on the EPrints wiki at https://wiki.eprints.org/w/Issues2.
The plug-in itself is available for download from the EPrints Bazaar at http://bazaar.eprints.org/523/.