Citation analysis for research evaluation has been a subject of increased interest for the past few decades. The 3C Shared task aims to create a platform encouraging researchers to participate in research in this area.
The first edition of the shared task organised by the researchers at CORE, Knowledge Media Institute (KMi), The Open University, UK featured the classification of citations for research impact analysis. The new shared task, known as the 3C Citation Context Classification task, organised as part of the 8th International Workshop on Mining Scientific Publications (WOSP), 2020 was hosted on the free data science competitions hosting platform, Kaggle InClass.
The shared task involved two subtasks: (1). Citation Context Classification task based on Purpose (Subtask A) and (2). Citation Context Classification based on Influence (Subtask B). The multi-class classification task based on purpose has 6 categories: BACKGROUND, COMPARES_CONTRASTS, MOTIVATION, USES, EXTENSION and FUTURE, representing a comprehensive set of functional citation roles. The second binary subtask aims at classifications into INCIDENTAL and INFLUENTIAL based on citation importance. This shared task utilised a portion of the new multi-disciplinary ACT dataset, using 3,000 and 1,000 instances for training and evaluation. Due to the class imbalance problem, we used macro f-score for evaluating the final submissions. Both subtasks were hosted as separate competitions on Kaggle. The shared task lasted 43 days, starting from May 11, 2020, until June 22, 2020.
A total of 4 teams took part in this shared task. The teams UFMG, SCUBED and AMRITA_CEN_NLP participated in both the subtasks whereas team Paul Larmuseau competed in just subtask B. Based on the final evaluation criteria, the teams UFMG and Paul Larmuseau emerged as winners with scores, 0.19425 and 0.55565 for subtask A and B respectively. All the teams were able to beat the majority-class baseline models for subtask A and B. Despite the recent advancements in Deep Learning methods, the participating teams submitted systems based on simple machine learning-based models for both subtasks. Moreover, such easy to use models were able to beat the more sophisticated transfer learning-based BERT model, which was submitted by the organisers as the competition progressed. The final scores on the private leaderboard obtained by teams for both subtasks can be found here, subtask A and subtask B.
The following are some of the highlights of the 3C Shared Task:
- The use of an online classification approach by the winning team UFMG.
- The most number of submissions to this shared task was made by Paul Larmuseau.
- Subbed experimented with different variants of Multi-Layer Perceptrons.
- The use of cost-sensitive learning approach for tackling class imbalance issue was used by Amrita_CEN_NLP
The summary of the 3C shared task and the important findings are compiled as an overview paper. The entire proceedings from the WOSP 2020 workshop can be found here. All the source code submitted by the teams are made public by the organisers. Subsequently, the Competing systems will serve as a benchmark for future research. We believe this will allow future comparison of participating systems head-to-head on the same data as well as on the same task.
The CORE Team