Multi-label Document Classification for COVID-19 Literature using Keyword Based Enhancement and Few-Shot Learning

Abstract

Multi-label text classification consists in attributing, for each textual document, one or more labels. Due to its nature, the task is often considered to be more challenging than other types of classification problems since the number of labels to assign is unknown. In text documents, this difficulty is generally the result of a blurry border between lexical fields of the labels or an underrepresentation of some of them. In this paper, we seek to automatically associate categories to scientific articles related to the COVID-19. We propose to address this multi-label classification problem by integrating an original keyword enhancement method to the TARS transformer-based approach designed to perform few-shot learning. Experiments conducted during the BioCreative challenge on the multi-label classification task show that our approach outperforms the baseline (ML-Net), no matter the metric considered.

Yanis Labrak
Yanis Labrak
Research Scientist

I am an Research Scientist in the Natural Language Processing department at the University of Avignon, France.