8th International Workshop on Mining Scientific Publications (WOSP), 2020

8th International Workshop on Mining Scientific Publications (WOSP), 2020

Due to unprecedented events following the global pandemic situation, this year, the 8th International Workshop on Mining Scientific Publications (WOSP), 2020 was fully organised virtually. The entire workshop constituted a single day, with four sessions, featuring keynote talks, with accepted paper presentations and a shared task on citation context classification. More details regarding the programme structure can be found here. The workshop this year was organised by CORE, The Open University, UK, in collaboration with Oak Ridge National Laboratory (ORNL), Tennessee, US.

The main highlight of the event was the live-streamed keynote sessions by 5 experts, whose talks were aligned with the general theme of the workshop. The first keynote session by Anne Lauscher explained methods for combining the argument extraction with rhetorical classification tasks using neural multi-tasking architectures. The talk by Allan Hanbury featured the automatic generation of Systematic Reviews in medicine and the challenges involved in the process. The third session started with the talk, “Mitigating document collection biases with citations: A case study on CORD-19”, by Kuansan Wang, which discussed approaches for identifying biases in the COVID-19 Open Research Dataset using the bibliographic data. The fourth keynote speaker, Neil Smalheiser, covered some of the recent methods and tools developed by his team for information extraction and visualisation related to biomedical articles and discussed challenges and novel ideas for literature-based discovery. The WOSP sessions concluded with a wonderful final talk by David Jurgens on citation function classification, which was of great interest to our shared task participants.

WOSP 2020 featured two tracks; (1) Research and (2) Shared task tracks. After the peer-review, 3 long papers and 4 short papers were accepted to the research track. The shared task track consisted of 5 short papers, submitted by 4 teams, who participated in the two subtasks.

The focus of the research track was centred mainly around the following three themes; (1) Infrastructure for scholarly publications mining, (2) Information Extraction and Mining and (3) Citation Analysis for research trend identification. The papers titled “Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA” and “SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning” focussed on citation parsing and citation context extraction respectively. The long paper “Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers” presented an extensive study on the issues of identity leaks of authors of scientific papers in the blind review system. While the paper “Representing and Reconstructing PhySH: Which Embedding Competent?” explored the ability to use word embeddings for recreating complex hierarchical systems, the paper “Term-Recency for TF-IDF, BM25 and USE Term Weighting” presented a new term-weighting strategy that weighs the terms based on the recency and the usage in the corpus. The workshop also included papers aimed at topic extraction (“The Normalized Impact Index for Keywords in Scholarly Papers to Detect Subtle Research Topics”) and papers addressing limitations of direct citation-based relatedness calculations for documents that receive little or no citations (“Virtual Citation Proximity (VCP): Learning Citation Proximity and Citation-Based Relatedness for Uncited Documents”).

This year, WOSP also included paper presentations by the participants on the new “3C Citation Context Classification” task, which was managed using Kaggle InClass Competitions. A total of four presentations based on the team system descriptions was delivered by the participants of the two subtasks; (1) Citation Classification based on Purpose (subtask A) and (2) Citation Classification based on Influence (subtask B). All the methods presented were based on simple machine learning models, using TF-IDF and word embeddings feature representations. The teams UFMG and Paul Larmuseau emerged as winners of the subtask A and subtask B, respectively.

We believe the participants of WOSP 2020 enjoyed all the four sessions. We hope to organise another exciting workshop in the future.


Leave a Reply

Your email address will not be published. Required fields are marked *