Multi-document summarization based on sentence cluster using Non-negative Matrix Factorization

Abstract

Multi-document summarization aims to produce a concise summary that contains salient information from a set of source documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence cluster using Nonnegative Matrix Tri-Factorization (NMTF). The proposed framework employs NMTF to cluster sentences using inter-type relationships among documents, sentences and terms, and incorporate the intra-type information through manifold regularization. The most informative sentences are selected from each sentence cluster to form the summary. When evaluated on the DUC2004 and TAC2008 datasets, the performance of the proposed framework is comparable with that of the top three systems.

Publication
Journal of Intelligent and Fuzzy Systems
Shirui Pan
Shirui Pan
Professor and ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.