Application of Dimension Reduction Methods to High-Dimensional Single-Cell 3D Genomic Contact Data
Author:
Wang Zilin1ORCID, Zhang Ping1, Sun Weicheng1, Li Dongxu2
Affiliation:
1. College of Informatics, Huazhong Agricultural University, Wuhan 430070, China 2. School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China
Abstract
The volume and complexity of data in various fields, particularly in biology, are increasing exponentially, posing a challenge to existing analytical methods, which often struggle with high-dimensional data such as single-cell Hi-C data. To address this issue, we employ unsupervised methods, specifically Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE), to reduce data dimensions for visualization. Furthermore, we assess the information retention of the decomposed components using a Linear Discriminant Analysis (LDA) classifier model. Our findings indicate that these dimensionality reduction techniques effectively capture and present information not readily apparent in the original high-dimensional data, facilitating the visualization and interpretation of complex biological data. The LDA classifier's performance suggests that PCA and t-SNE maintain critical information necessary for accurate classification. In conclusion, our study demonstrates that PCA and t-SNE are powerful tools for visualizing and analyzing high-dimensional biological data, enabling researchers to gain new insights and understandings that are challenging to achieve with traditional approaches.
Publisher
Institute of Emerging and Computer Engineers Inc
Reference20 articles.
1. Rosenthal, M., Bryner, D., Huffer, F., Evans, S., Srivastava, A., & Neretti, N. (2019). Bayesian estimation of three-dimensional chromosomal structure from single-cell Hi-C Data. Journal of Computational Biology, 26(11), 1191–1202. 2. Yang, T., Zhang, F., Yardımci, G. G., Song, F., Hardison, R. C., Noble, W. S., Yue, F., & Li, Q. (2017). HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. 3. Ursu, O., Boley, N., Taranova, M., Wang, Y. R., Yardimci, G. G., Stafford Noble, W., & Kundaje, A. (2018). GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics, 34(16), 2701-2707. 4. Yan, K. K., Yardımcı, G. G., Yan, C., Noble, W. S., & Gerstein, M. (2017). HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics, 33(14), 2199-2201. 5. Sauria, M. E., & Taylor, J. (2017). QuASAR: quality assessment of spatial arrangement reproducibility in Hi-C data. BioRxiv, 204438.
|
|