Affiliation:
1. LRDSI Laboratory Computer Science Department, Universite Saad Dahlab Blida, Blida, Algeria
2. Department of Computer and Information Sciences, Arkansas Tech University, Russellville, AR, USA
Abstract
COVID-19 has become the world’s worst pandemic and has claimed over six million lives as of March 2022. The virus is now in alongside cancer as one of the most common causes of death. Likewise, there is no definitive or unique treatment for COVID-19 outside of a selected few drugs approved by the Food and Drug Administration (FDA). While Artificial Intelligence (AI) can be used to generate molecules that target Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the virus responsible for COVID-19, such molecules are novel and do not yet exist in the market. With the emergence and availability of several drug datasets related to COVID-19 (tests, images, graphs, and ChEMBLs), recent works based on Deep Learning (DL) techniques have been employed to generate molecules and check the effectiveness of existing molecules on COVID-19. In our study, we investigated the benefits of an Encoder–Decoder (ED) architecture based on Long Short-Term Memory (LSTM) cells. As a result, the molecules were converted into a vector during the encoding phase, which was then decoded back into SMILES molecules strings. We propose an approach to incorporate four features of Principal Components Analysis (PCA) with Encoder–Decoder Long Short-Term Memory (ED-LSTM) for regularization, which means that, instead of avoiding linear mapping, we assumed that the data could be linearly separable. We concluded that ED-LSTM with unit norm constraint has the best reconstruction accuracy in the context of generating molecules. The resulting dataset was used with the aid of virtual screening and convolutional neural networks to check the drugs that have the best binding affinity with SARS-CoV-2. We achieved an accuracy of 87.35% on the test set.
Publisher
World Scientific Pub Co Pte Ltd