Affiliation:
1. Department of Physics and Sungkyunkwan Advanced Institute of Nanotechnology (SAINT) Sungkyunkwan University Suwon 16419 Republic of Korea
2. DNASTech Industry‐Academic Cooperation Center Sungkyunkwan University Suwon 16419 Republic of Korea
3. Department of Artificial Intelligence Sungkyunkwan University Suwon 16419 Republic of Korea
Abstract
AbstractThis study develops two deoxyribonucleic acid (DNA) lossy compression models, Models A and B, to encode grayscale images into DNA sequences, enhance information density, and enable high‐fidelity image recovery. These models, distinguished by their handling of pixel domains and interpolation methods, offer a novel approach to data storage for DNA. Model A processes pixels in overlapped domains using linear interpolation (LI), whereas Model B uses non‐overlapped domains with nearest‐neighbor interpolation (NNI). Through a comparative analysis with Joint Photographic Experts Group (JPEG) compression, the DNA lossy compression models demonstrate competitive advantages in terms of information density and image quality restoration. The application of these models to the Modified National Institute of Standards and Technology (MNIST) dataset reveals their efficiency and the recognizability of decompressed images, which is validated by convolutional neural network (CNN) performance. In particular, Model B2, a version of Model B, emerges as an effective method for balancing high information density (surpassing over 20 times the typical densities of two bits per nucleotide) with reasonably good image quality. These findings highlight the potential of DNA‐based data storage systems for high‐density and efficient compression, indicating a promising future for biological data storage solutions.
Funder
National Research Foundation
Sungkyunkwan University