A parallel algorithm for record clustering-Reference-Cited by-同舟云学术

A parallel algorithm for record clustering

Published:1990-12 Issue:4 Volume:15 Page:599-624
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Omiecinski Edward¹,Scheuermann Peter²

Affiliation:

1. Georgia Institute of Technology, Atlanta

2. Northwestern Univ., Evanston, IL

Abstract

We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an enviornment with relatively little overlap among the queries.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/99935.99947

Reference24 articles.

1. Concepts and capabilities of a database computer\

2. Optimal Sorting Algorithms for Parallel Computers

3. A taxonomy of parallel sorting

4. Parallel algorithms for the execution of relational database operations

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. pPOP: Fast yet accurate parallel hierarchical clustering using partitioning;Data & Knowledge Engineering;2007-06

2. Research issues in automatic database clustering;ACM SIGMOD Record;2005-03

3. Information retrieval on the web;ACM Computing Surveys;2000-06

4. Perspectives on operations research in data and knowledge management;European Journal of Operational Research;1998-11

5. Two techniques for on-line index modification in shared nothing parallel databases;ACM SIGMOD Record;1996-06