Abstract
Lensless computational imaging, a technique that combines optical-modulated
measurements with task-specific algorithms, has recently benefited
from the application of artificial neural networks. Conventionally,
lensless imaging techniques rely on prior knowledge to deal with the
ill-posed nature of unstructured measurements, which requires costly
supervised approaches. To address this issue, we present a
self-supervised learning method that learns semantic representations
for the modulated scenes from implicitly provided priors. A
contrastive loss function is designed for training the target
extractor (measurements) from a source extractor (structured natural
scenes) to transfer cross-modal priors in the latent space. The
effectiveness of the new extractor was validated by classifying the
mask-modulated scenes on unseen datasets and showed the comparable
accuracy to the source modality (contrastive language-image
pre-trained [CLIP] network). The proposed multimodal representation
learning method has the advantages of avoiding costly data annotation,
being more adaptive to unseen data, and usability in a variety of
downstream vision tasks with unconventional imaging settings.
Funder
National Science Council, Taiwan
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献