On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications-Reference-Cited by-同舟云学术

On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

Published:2021-03-09 Issue: Volume: Page:
ISSN:1091-9856
Container-title:INFORMS Journal on Computing
language:en
Short-container-title:INFORMS Journal on Computing

Author:

Chen Shutong¹,Xie Weijun²^ORCID

Affiliation:

1. School of Business and Management, Donghua University, 200051 Shanghai, China;

2. Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia 24061

Abstract

This paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. The objective of CluSL is to simultaneously find the best clusters of the data points and minimize the sum of loss functions within each cluster. This framework has many potential applications in healthcare, operations management, manufacturing, and so on. Because CluSL, in general, is nonconvex, we develop a regularized alternating minimization (RAM) algorithm to solve it, where at each iteration, we penalize the distance between the current clustering solution and the one from the previous iteration. By choosing a proper penalty function, we show that each iteration of the RAM algorithm can be computed efficiently. We further prove that the proposed RAM algorithm will always converge to a stationary point within a finite number of iterations. This is the first known convergence result in cluster-aware learning literature. Furthermore, we extend CluSL to the high-dimensional data sets, termed the F-CluSL framework. In F-CluSL, we cluster features and minimize loss function at the same time. Similarly, to solve F-CluSL, a variant of the RAM algorithm (i.e., F-RAM) is developed and proven to be convergent to an [Formula: see text]-stationary point. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can outperform the existing ones such as random forests and support vector classification, both in the interpretability of learning results and in prediction accuracy. Summary of Contribution: Aligned with the mission and scope of the INFORMS Journal on Computing, this paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. Because CluSL is, in general, nonconvex, a regularized alternating projection algorithm is developed to solve it and is proven to always find a stationary solution. We further generalize the framework to the high-dimensional data set, F-CluSL. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can deliver more interpretable learning results and outperform the existing ones such as random forests and support vector classification in computational time and prediction accuracy.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

General Engineering

Reference55 articles.

1. Regionalization of climate over the Argentine Pampas

2. Analysis of Clustering and Classification Methods for Actionable Knowledge

3. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality

4. An algorithm for clusterwise linear regression based on smoothing techniques

5. On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clustering Then Estimation of Spatio-Temporal Self-Exciting Processes;INFORMS Journal on Computing;2024-09-05

2. Prototype-based Models for Real Estate Valuation: A Machine Learning Model That Explains Prices;SSRN Electronic Journal;2024

3. Clustering then Estimation of Spatio-Temporal Self-Exciting Processes;2024

4. Automated Vehicle Identification Based on Car-Following Data With Machine Learning;IEEE Transactions on Intelligent Transportation Systems;2023-12

5. py-irt: A Scalable Item Response Theory Library for Python;INFORMS Journal on Computing;2023-01