Exploration of Biodegradable Substances Using Machine Learning Techniques
-
Published:2023-08-23
Issue:17
Volume:15
Page:12764
-
ISSN:2071-1050
-
Container-title:Sustainability
-
language:en
-
Short-container-title:Sustainability
Author:
Elsayad Alaa M.1ORCID, Zeghid Medien12ORCID, Ahmed Hassan Yousif1ORCID, Elsayad Khaled A.3
Affiliation:
1. Department of Electrical Engineering, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir 11991, Saudi Arabia 2. Electronics and Micro-Electronics Laboratory, Faculty of Sciences, University of Monastir, Monastir 5000, Tunisia 3. Pharmacy Department, Cairo University Hospitals, Cairo University, Cairo 11662, Egypt
Abstract
The concept of being readily biodegradable is crucial in evaluating the potential effects of chemical substances on ecosystems and conducting environmental risk assessments. Substances that readily biodegrade are generally associated with lower environmental persistence and reduced risks to the environment compared to those that do not easily degrade. The accurate development of quantitative structure–activity relationship (QSAR) models for biodegradability prediction plays a critical role in advancing the design and creation of sustainable chemicals. In this paper, we report the results of our investigation into the utilization of classification and regression trees (CARTs) in classifying and selecting features of biodegradable substances based on 2D molecular descriptors. CARTs are a well-known machine learning approach renowned for their simplicity, scalability, and built-in feature selection capabilities, rendering them highly suitable for the analysis of large datasets. Curvature and interaction tests were employed to construct efficient and unbiased trees, while Bayesian optimization (BO) and repeated cross-validation techniques were utilized to improve the generalization and stability of the trees. The main objective was to classify substances as either readily biodegradable (RB) or non-readily biodegradable (NRB). We compared the performance of the proposed CARTs with support vector machine (SVM), K nearest neighbor (kNN), and regulated logistic regression (RLR) models in terms of overall accuracy, sensitivity, specificity, and receiver operating characteristics (ROC) curve. The experimental findings demonstrated that the proposed CART model, which integrated curvature–interaction tests, outperformed other models in classifying the test subset. It achieved accuracy of 85.63%, sensitivity of 87.12%, specificity of 84.94%, and a highly comparable area under the ROC curve of 0.87. In the prediction process, the model identified the top ten most crucial descriptors, with the SpMaxB(m) and SpMin1_Bh(v) descriptors standing out as notably superior to the remaining descriptors.
Funder
Deanship of Scientific Research at Prince Sattam Bin Abdulaziz University
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference46 articles.
1. Biodegradability of plastics: The issues, recent advances, and future perspectives;Gu;Environ. Sci. Pollut. Res.,2021 2. Testing biodegradability with standardized methods;Pagga;Chemosphere,1997 3. Grisoni, F., Ballabio, D., Todeschini, R., and Consonni, V. (2018). Computational Toxicology: Methods and Protocols, Springer. 4. Origins, Current Status, and Future Challenges of Green Chemistry;Anastas;Acc. Chem. Res.,2002 5. QSAR/QSPR models based on quantum chemistry for risk assessment of pesticides according to current European legislation;Villaverde;SAR QSAR Environ. Res.,2019
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|