Mining top-k minimal redundancy frequent patterns over uncertain databases

Abstract

Frequent pattern mining from uncertain data has been paid closed attention due to most of the real life databases contain data with uncertainty. Several approaches have been proposed for mining high significance frequent itemsets over uncertain data, however, previous algorithms yield many redundant frequent itemsets and require to set an appropriate user specified threshold which is difficult for users. In this paper, we formally define the problem of top-fc minimal redundancy probabilistic frequent pattern mining, which targets to identify top-fc patterns with high-significance and low-redundancy simultaneously from uncertain data. We first design uncertain pattern correlation based on Pearson correlation coefficient, which considers pattern uncertainty. Moreover, we present a new algorithm, UTFP, to mine top-fc minimal redundancy frequent patterns of length no less than minimum length mind without setting threshold. We further propose a set of strategies to prune and reduce search space. Experimental results demonstrate that the proposed algorithm achieves good performance in terms of finding top-fc frequent patterns with low redundancy on probabilistic data. Our method represents the first research endeavor for probabilistic data based top-fc correlated pattern mining.

Publication
Neural Information Processing
Shirui Pan
Shirui Pan
Professor and ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.