Abstract
AbstractPartial orders and directed acyclic graphs are commonly recurring data structures that arise naturally in numerous domains and applications and are used to represent ordered relations between entities in the domains. Examples are task dependencies in a project plan, transaction order in distributed ledgers and execution sequences of tasks in computer programs, just to mention a few. We study the problem of order preserving hierarchical clustering of this kind of ordered data. That is, if we have $$a<b$$
a
<
b
in the original data and denote their respective clusters by [a] and [b], then we shall have $$[a]<[b]$$
[
a
]
<
[
b
]
in the produced clustering. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage, and is an extension of classical hierarchical clustering. To achieve this, we develop a novel theory that extends classical hierarchical clustering to strictly partially ordered sets. We define the output from running classical hierarchical clustering on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the p-norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting. A reference implementation is employed for experiments on both synthetic random data and real world data from a database of machine parts. When compared to existing methods, the experiments show that our method excels both in cluster quality and order preservation.
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference40 articles.
1. Ackerman, M., & Ben-David, S. (2016). A characterization of linkage-based hierarchical clustering. Journal of Machine Learning Research, 17(231), 1–17.
2. Basu, S., Davidson, I., & Wagstaff, K. (2008). Constrained clustering: Advances in algorithms, theory, and applications (1st ed.). Chapman & Hall/CRC.
3. Blyth, T. (2005). Lattices and ordered algebraic structures. Universitext, Springer.
4. Bollobás, B. (2001). Random Graphs, Cambridge Studies in Advanced Mathematics (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511814068.
5. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., & Schulz, C. (2016). Recent advances in graph partitioning (pp. 117–158). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-49487-6_4.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献