Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through Summarization
Author:
Hashmi Syed Jalaluddin1ORCID, Alabdullah Bayan2, Al Mudawi Naif3ORCID, Algarni Asaad4, Jalal Ahmad5, Liu Hui6ORCID
Affiliation:
1. School of Computing, National University of Computer and Emerging Science, Islamabad 44000, Pakistan 2. Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia 3. Department of Computer Science, College of Computer Science and Information System, Najran University, Najran 55461, Saudi Arabia 4. Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha 91911, Saudi Arabia 5. Faculty of Computing and AI, Air University, E-9, Islamabad 44000, Pakistan 6. Cognitive Systems Lab, University of Bremen, 28359 Bremen, Germany
Abstract
The acquisition, processing, mining, and visualization of sensory data for knowledge discovery and decision support has recently been a popular area of research and exploration. Its usefulness is paramount because of its relationship to the continuous involvement in the improvement of healthcare and other related disciplines. As a result of this, a huge amount of data have been collected and analyzed. These data are made available for the research community in various shapes and formats; their representation and study in the form of graphs or networks is also an area of research which many scholars are focused on. However, the large size of such graph datasets poses challenges in data mining and visualization. For example, knowledge discovery from the Bio–Mouse–Gene dataset, which has over 43 thousand nodes and 14.5 million edges, is a non-trivial job. In this regard, summarizing the large graphs provided is a useful alternative. Graph summarization aims to provide the efficient analysis of such complex and large-sized data; hence, it is a beneficial approach. During summarization, all the nodes that have similar structural properties are merged together. In doing so, traditional methods often overlook the importance of personalizing the summary, which would be helpful in highlighting certain targeted nodes. Personalized or context-specific scenarios require a more tailored approach for accurately capturing distinct patterns and trends. Hence, the concept of personalized graph summarization aims to acquire a concise depiction of the graph, emphasizing connections that are closer in proximity to a specific set of given target nodes. In this paper, we present a faster algorithm for the personalized graph summarization (PGS) problem, named IPGS; this has been designed to facilitate enhanced and effective data mining and visualization of datasets from various domains, including biosensors. Our objective is to obtain a similar compression ratio as the one provided by the state-of-the-art PGS algorithm, but in a faster manner. To achieve this, we improve the execution time of the current state-of-the-art approach by using weighted, locality-sensitive hashing, through experiments on eight large publicly available datasets. The experiments demonstrate the effectiveness and scalability of IPGS while providing a similar compression ratio to the state-of-the-art approach. In this way, our research contributes to the study and analysis of sensory datasets through the perspective of graph summarization. We have also presented a detailed study on the Bio–Mouse–Gene dataset, which was conducted to investigate the effectiveness of graph summarization in the domain of biosensors.
Funder
Open Access Initiative of the University of Bremen DFG via SuUB Bremen Deanship of Scientific Research at Najran University, under the Research Group Funding program Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Reference99 articles.
1. Cytoscape 2.8: New features for data integration and network visualization;Smoot;Bioinformatics,2011 2. Qian, X., Zhou, Y., Liao, B., Xin, Z., Xie, W., Hu, C., and Luo, A. (2023). Named Entity Recognition of Diabetes Online Health Community Data Using Multiple Machine Learning Models. Bioengineering, 10. 3. Hyperlink communities in higher-order networks;Francesco;J. Complex Netw.,2024 4. Borgatti, S., Everett, M., Johnson, J., and Agneessens, F. (2024). Analyzing Social Networks, SAGE Publications Limited. 5. Vegas: Visual influence graph summarization on citation networks;Hi;IEEE Trans. Knowl. Data Eng.,2015
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|