GraphGuard: Detecting and Counteracting Training Data Misuse in Graph Neural Networks

Abstract

The emergence of Graph Neural Networks (GNNs) in graph data analysis and their deployment on Machine Learning as a Service platforms have raised critical concerns about data misuse during model training. This situation is further exacerbated due to the lack of transparency into local training processes, potentially leading to the unauthorised accumulation of large volumes of graph data, thereby infringing on the intellectual property rights of data owners. Existing methodologies often address either data misuse detection or mitigation, and are primarily designed for local GNN models rather than cloud-based MLaaS platforms. These limitations call for an effective and comprehensive solution that detects and mitigates data misuse without requiring the exact training data while respecting the proprietary nature of such data. This paper introduces a pioneering approach called GraphGuard, to tackle these challenges. We propose a training-data-free method that not only detects graph data misuse but also mitigates its impact via targeted unlearning, all without relying on the original training data. Our innovative misuse detection technique employs membership inference with radioactive data, enhancing the discernibility between member and non-member data distributions. For mitigation, we utilise synthetic graphs that emulate the characteristics previously learned by the target model, enabling effective unlearning even in the absence of exact graph data. We conduct comprehensive experiments utilising four real-world graph datasets to demonstrate the efficacy of GraphGuard in both detection and unlearning. We show that GraphGuard attains a near-perfect detection rate of approximately 100% across these datasets with various GNN models. Additionally, it accomplishes unlearning by eliminating the impact from the unlearned graph with a marginal decrease in accuracy (less than 5%).

Publication
The Network and Distributed System Security Symposium (NDSS), San Francisco, CA, USA, 26 February–1 March, 2024 (CORE A*)
Bang Wu
Bang Wu
Postdoc @ CSIRO’s Data61

My research interests include machine learning, security and privacy of machine learning.

He Zhang
He Zhang
PhD

My research interests include data mining, machine learning, and graph analysis.

Shirui Pan
Shirui Pan
Professor and ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.