Flexible model-based non-negative matrix factorization with application to mutational signatures
Author:
Laursen Ragnhild1, Maretty Lasse2, Hobolth Asger1
Affiliation:
1. Department of Mathematics , 1006 Aarhus University , Aarhus , Denmark 2. Department of Clinical Medicine and Bioinformatics Research Center , 1006 Aarhus University , Aarhus , Denmark
Abstract
Abstract
Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation–maximization (EM) algorithm and regression in the log-linear quasi–Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.
Funder
Novo Nordisk Foundation
Publisher
Walter de Gruyter GmbH
Reference27 articles.
1. Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J., and Stratton, M.R. (2013). Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3: 246–259. https://doi.org/10.1016/j.celrep.2012.12.008. 2. Alexandrov, L.B., Ju, Y.S., Haase, K., Van Loo, P., Martincorena, I., Nik-Zainal, S., Totoki, Y., Fujimoto, A., Nakagawa, H., Shibata, T., et al.. (2016). Mutational signatures associated with tobacco smoking in human cancer. Science 354: 618–622. https://doi.org/10.1126/science.aag0299. 3. Alexandrov, L.B., Kim, J., Haradhvala, N.J., Huang, M.N., Tian Ng, A.W., Wu, Y., Boot, A., Covington, K.R., Gordenin, D.A., Bergstrom, E.N., et al.. (2020). The repertoire of mutational signatures in human cancer. Nature 578: 94–101. https://doi.org/10.1038/s41586-020-1943-3. 4. Arndt, P.F., Burge, C.B., and Hwa, T. (2003). DNA sequence evolution with neighbor-dependent mutation. J. Comput. Biol. 10: 313–322. https://doi.org/10.1089/10665270360688039. 5. Bertl, J., Guo, Q., Juul, M., Besenbacher, S., Nielsen, M.M., Hornshøj, H., Pedersen, J.S., and Hobolth, A. (2018). A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data. BMC Bioinf. 19: 147, https://doi.org/10.1186/s12859-018-2141-2.
|
|