1. Andrei, Z.B., Steven, C.G., Mark, S., Manasse, G.Z.: Syntactic clustering of the web. Comput. Netw. 29(8–13), 1157–1166 (1997)
2. Chowdhury, A., Frieder, O., Grossman, D., McCabe, M.C.: Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst. 20(2), 171–191 (2002)
3. Lin, Y.S., Liao, T.Y., Lee, S.J.: Detecting near-duplicate documents using sentence-level features and supervised learning. Expert Syst. Appl. 40, 1467–1476 (2013)
4. Lecture Notes in Computer Science;J-H Wang,2009
5. Shivakumar, N., Garcia-Molina, H.: SCAM: a copy detection mechanism for digital documents. In: Proceedings of the International Conference on Theory and Practice of Digital Libraries (1995)