1. Real-world data is dirty;Hernandez;Data Mining and Knowledge Discovery,1998
2. A. McCallum, K. Nigam, L.H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, in: KDD '00: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, 2000, pp. 169–178 .
3. P. Christen, A survey of indexing techniques for scalable record linkage and deduplication, IEEE Transactions on Knowledge and Data Engineering 99 (PrePrints). 〈http://dx.doi.org/http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.127〉.
4. Fast algorithms for frequent itemset mining using fp-trees;Grahne;IEEE Transactions on Knowledge and Data Engineering,2005
5. L. Parsons, Evaluating subspace clustering algorithms, in: In Workshop on Clustering High Dimensional Data and its Applications, SIAM International Conference on Data Mining (SDM 2004), 2004, pp. 48–56 .