Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso-Reference-Cited by-同舟云学术

Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso

Published:2020-12-16 Issue:2 Volume:11 Page:75-81
ISSN:2476-907X
Container-title:ComTech: Computer, Mathematics and Engineering Applications
language:
Short-container-title:ComTech

Author:

Rochayani Masithoh Yessi,Sa'adah Umu,Astuti Ani Budi

Abstract

The research conducted undersampling and gene selection as a starting point for cancer classification in gene expression datasets with a high-dimensional and imbalanced class. It investigated whether implementing undersampling before gene selection gave better results than without implementing undersampling. The used undersampling method was Random Undersampling (RUS), and for gene selection, it was Lasso. Then, the selected genes based on theory were validated. To explore the effectiveness of applying RUS before gene selection, the researchers used two gene expression datasets. Both of the datasets consisted of two classes, 1.545 observations and 10.935 genes, but had a different imbalance ratio. The results show that the proposed gene selection methods, namely Lasso and RUS + Lasso, can produce several important biomarkers, and the obtained model has high accuracy. However, the model is complicated since it involves too many genes. It also finds that undersampling is not affected when it is implemented in a less imbalanced class. Meanwhile, when the dataset is highly imbalanced, undersampling can remove a lot of information from the majority class. Nevertheless, the effectiveness of undersampling remains unclear. Simulation studies can be carried out in the next research to investigate when undersampling should be implemented.

Publisher

Universitas Bina Nusantara

Subject

General Medicine

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Çoklu Doğrusal Bağlantılı Nadir Olayların Modellenmesinde Lasso ve Ridge Regresyon ile Boosting Algoritmalarının Performans Karşılaştırması;Sinop Üniversitesi Fen Bilimleri Dergisi;2024-06-29

2. Cart method approach and high dimension simulation data selection method in stunting case in the Covid-19 era;THE 10TH INTERNATIONAL BASIC SCIENCE INTERNATIONAL CONFERENCE (BASIC) 2022;2023