Abstract
AbstractPast research in systems biology has taken for granted the Euclidean geometry of biological space. This has not only drawn parallels to other fields but has also been convenient due to the ample statistical and numerical optimization tools available to address the core task and downstream machine learning problems. However, emerging theoretical studies now demonstrate that biological databases exhibit hierarchical topology, characterized by heterogeneous degree distribution and a high degree of clustering, thus contradicting the flat geometry assumption. Namely, since the number of nodes in hierarchical structures grows exponentially with node depth, the biological networks naturally reside in a hyperbolic space where the circle circumference and disk area are the exponential functions of the radius. To test these claims and assess potential benefits of the applications grounded in the above hypothesis, we have developed a mathematical framework and an accompanying computational procedure for matrix factorization and implied biological relationship inference in hyperbolic space. Not only does our study demonstrate a significant increase in the accuracy of hyperbolic embedding compared to Euclidean embedding, but it also shows that the latent dimension of an optimal hyperbolic embedding is by more than an order of magnitude smaller than the latent dimension of an optimal Euclidean embedding. We see this as additional evidence that hyperbolic geometry, rather than Euclidean, underlines the biological system.
Publisher
Cold Spring Harbor Laboratory