<i>sparsesurv</i>: a Python package for fitting sparse survival models via knowledge distillation-Reference-Cited by-同舟云学术

sparsesurv: a Python package for fitting sparse survival models via knowledge distillation

Published:2024-08-23 Issue:9 Volume:40 Page:
ISSN:1367-4811
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Wissel David¹²³^ORCID,Janakarajan Nikita¹⁴^ORCID,Schulte Julius¹^ORCID,Rowson Daniel¹³,Yuan Xintian¹,Boeva Valentina¹³⁵^ORCID

Affiliation:

1. Department of Computer Science, ETH Zurich , Zurich, 8092, Switzerland

2. Department of Molecular Life Sciences, University of Zurich , Zurich, 8057, Switzerland

3. SIB Swiss Institute of Bioinformatics , Lausanne, 1015, Switzerland

4. IBM Research Europe , Zurich, 8803, Switzerland

5. Université de Paris, UMR-S1016, Institut Cochin , Paris, 75014, France

Abstract

Abstract Motivation Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter. Results In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher–student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation. Availability and implementation sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).

Funder

Swiss National Science Foundation

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btae521/58902371/btae521.pdf

Reference31 articles.

1. The group exponential lasso for bi-level variable selection;Breheny;Biometrics,2015

2. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors;Breheny;Stat Comput,2015

3. Penalized methods for bi-level variable selection;Breheny;Stat Interface,2009

4. Contribution to discussion of paper by dr cox;Breslow;J Royal Stat Soc Ser B,1972