Author:
Bai Yuanchao,Yang Xu,Liu Xianming,Jiang Junjun,Wang Yaowei,Ji Xiangyang,Gao Wen
Abstract
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Coding Framework and Benchmark Towards Low-Bitrate Video Understanding;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-08
2. Unified and Scalable Deep Image Compression Framework for Human and Machine;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-07-17
3. Region-of-Interest-Based Video Coding for Machines;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15
4. Compressed-Domain Vision Transformer for Image Classification;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2024-06
5. Deep Learning Guided Video Compression for Machine Vision Tasks;2024-05-27