Wei Fan, Erheng Zhong, et al.
SDM 2010
This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-Means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets. © 2009 IEEE.
Wei Fan, Erheng Zhong, et al.
SDM 2010
Alain Biem, Eric Bouillet, et al.
SIGMOD 2010
Nicholas Mastronarde, Deepak S. Turaga, et al.
IEEE Journal on Selected Areas in Communications
Pascal Frossard, Olivier Verscheure, et al.
ICASSP 2006