Divide and Denoise: Empowering Simple Models for Robust Semi-Supervised Node Classification against Label Noise

Abstract

Graph neural networks (GNNs) based on message passing have achieved remarkable performance in graph machine learning. By combining it with the power of pseudo labeling, one can further push forward the performance on the task of semi-supervised node classification. However, most existing works assume that the training node labels are purely noise-free, while this strong assumption usually does not hold in practice. GNNs will overfit the noisy training labels and the adverse effects of mislabeled nodes can be exaggerated by being propagated to the remaining nodes through the graph structure, exacerbating the model failure. Worse still, the noisy pseudo labels could also largely undermine the model’s reliability without special treatment. In this paper, we revisit the role of (1) message passing and (2) pseudo labels in the studied problem and try to address two denoising subproblems from the model architecture and algorithm perspective, respectively. Specifically, we first develop a label-noise robust GNN that discards the coupled message-passing scheme. Despite its simple architecture, this learning backbone prevents overfitting to noisy labels and also inherently avoids the noise propagation issue. Moreover, we propose a novel reliable graph pseudo labeling algorithm that can effectively leverage the knowledge of unlabeled nodes while mitigating the adverse effects of noisy pseudo labels. Based on those novel designs, we can attain exceptional effectiveness and efficiency in solving the studied problem. We conduct extensive experiments on benchmark datasets for semi-supervised node classification with different levels of label noise and show new state-of-the-art performance.

Publication
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Aug 25, 2024 - Aug 29, 2024, Barcelona, Spain (CORE A*)
Yixin Liu
Yixin Liu
ARC Research Fellow

My research interests include machine learning, graph analysis and audio processing.

Shirui Pan
Shirui Pan
Professor and ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.