3D Tensor Auto-encoder with Application to Video Compression
-
Published:2021-06
Issue:2
Volume:17
Page:1-18
-
ISSN:1551-6857
-
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
-
language:en
-
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.
Author:
Li Yang1,
Liu Guangcan2,
Sun Yubao2,
Liu Qingshan2,
Chen Shengyong3
Affiliation:
1. School of IoT Engineering (School of Information Security), Jiangsu Vocational College of Information Technology, Wuxi, China
2. Nanjing University of Information Science and Technology, Nanjing, China
3. Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, Tianjin, China
Abstract
Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension
n
, the number of parameters in an auto-encoder is in general
O
(
n
). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just
O
(
n
1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.
Funder
National Natural Science Foundation of China
Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province
Research Project of Jiangsu Vocational College of Information Technology
New Generation AI Major Project of Ministry of Science and Technology of China
High Level of Jiangsu Province Key Construction Project Fund
“Qing Lan Project” Teaching Team in Colleges and Universities of Jiangsu Province
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference52 articles.
1. ISO/IEC CD 23090-3 Versatile Video Coding document N10692 Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/. ISO/IEC CD 23090-3 Versatile Video Coding document N10692 Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/.
2. Image compression using JPEG with reduced blocking effects via adaptive down-sampling and self-learning image sparse representation
3. Mohammad Haris Baig Vladlen Koltun and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255. Mohammad Haris Baig Vladlen Koltun and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献