Abstract
AbstractThe observation of similar clinical characteristics across a broad spectrum of diseases suggests the existence of underlying shared molecular mechanisms. Identifying these mechanisms is critical for uncovering the molecular roots of diseases and advancing the development of innovative therapeutic strategies. However, researching the common genes that mediate similar phenotypes among different diseases often requires the integration of various sequencing datasets and clinical data. The batch effects among these datasets and the complexity of clinical data present significant challenges to the research. This study developed a framework named “clGENE”, aimed at uncovering the molecular mechanisms behind similar phenotypes across different diseases. By integrating data normalization, cosine similarity analysis, and principal component analysis (PCA) algorithms, this framework is capable of effectively identifying shared molecular mechanisms associated with specific phenotypes and further selecting key shared genes. Through the analysis of a pan-cancer dataset, we have verified the efficacy and reliability of the “clGENE” framework. Furthermore, this study also established a dataset on immune cell infiltration and successfully identified key patterns of immune cell infiltration in different cancer lymph node metastasis stages using the ‘clGENE’ framework, further confirming its potential application in biomedical research.
Publisher
Cold Spring Harbor Laboratory