Abstract
Background
Histopathological image analysis plays a crucial role in the diagnosis and prognosis of various diseases, including cancer. In the domain of lung cancer diagnosis, accurate classification of histopathological images into different subtypes, such as Adenocarcinoma (ACA), Squamous Cell Carcinoma (SCC) and Benign (BNT) tumors, is essential for personalized treatment planning and patient management. However, manual interpretation of these images by pathologists are time-consuming and subjective, which highlight the importance for novel, automated and reliable image analysis techniques. Here, we propose a novel framework for histopathological image classification using graph-based representation learning techniques.
Methods
Firstly, the image patch extraction module facilitates the extraction and optional saving of patches from input images, crucial for subsequent feature analysis. The framework leverages Gray-Level Co-occurrence Matrix (GLCM) features for texture analysis, capturing spatial relationships between pixel intensities in histopathological images. By computing GLCM features for each image, a graph representation is constructed, where nodes represent images and edges capture pairwise similarities between images based on their texture characteristics. To learn low-dimensional representations of images within the constructed graph, DeepWalk, a state-of-the-art graph-based embedding technique, is employed. DeepWalk explores the graph structure through random walks and learns embeddings that capture the underlying semantic relationships between images. These learned embeddings serve as discriminative features for image classification, enabling the model to differentiate between different histological subtypes of lung cancer. The performance of the proposed framework is evaluated on the publicly available LC25000 dataset, consisting of a diverse collection of histopathological images of lung tissue samples.
Results
Experimental results demonstrate the effectiveness of the proposed approach in accurately classifying lung cancer subtypes. The classification performance is assessed using key metrics including precision, recall, and F1-score. For ACA, the achieved metrics are precision 0.8478, recall 0.8636, and F1-score 0.8556. Similarly, for BNT tumors, the corresponding metrics are precision 0.8905, recall 0.8500, and F1-score 0.8162. For SCC, the metrics are precision 0.8875, recall 0.8364, and F1-score 0.8579. Furthermore, the research explores the interpretability of learned embeddings, providing insights into the underlying relationships between histopathological images.
Conclusion
In conclusion, our framework showcases promising outcomes in automating the classification of histopathological lung cancer images. By amalgamating GLCM features and DeepWalk embeddings via graph-based learning, a robust method is devised for discerning between various lung cancer subtypes. The results demonstrate high precision, recall, and F1-score across adenocarcinoma, squamous cell carcinoma, and benign tumors, highlighting its potential to aid pathologists in accurate diagnosis. Moreover, the interpretability of learned embeddings enhances our comprehension of disease pathology. Future research can explore scalability and integration of additional data for more personalized approaches, contributing to cancer diagnosis advancement via AI and ML.