Vehicle classification based on audio-visual feature fusion with low-quality images and noise-Reference-Cited by-同舟云学术

Vehicle classification based on audio-visual feature fusion with low-quality images and noise

Published:2023-11-04 Issue:5 Volume:45 Page:8931-8944
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Zhao Yiming¹,Zhao Hongdong¹,Zhang Xuezhi¹,Liu Weina¹

Affiliation:

1. School of Electronic Information and Engineering, Hebei University of Technology, Tianjin, P.R. China

Abstract

In Intelligent Transport Systems (ITS), vision is the primary mode of perception. However, vehicle images captured by low-cost traffic cameras under challenging weather conditions often suffer from poor resolution and insufficient detail representation. On the other hand, vehicle noise provides complementary auditory features that offer advantages such as environmental adaptability and a large recognition distance. To address these limitations and enhance the accuracy of low-quality traffic surveillance classification and identification, an effective audio-visual feature fusion method is crucial. This paper presents a research study that establishes an Urban Road Vehicle Audio-visual (URVAV) dataset specifically designed for low-quality images and noise recorded in complex weather conditions. For low-quality vehicle image classification, the paper proposes a simple Convolutional Neural Network (CNN)-based model called Low-quality Vehicle Images Net (LVINet). Additionally, to further enhance classification accuracy, a spatial channel attention-based audio-visual feature fusion method is introduced. This method converts one-dimensional acoustic features into a two-dimensional audio Mel-spectrogram, allowing for the fusion of auditory and visual features. By leveraging the high correlation between these features, the representation of vehicle characteristics is effectively enhanced. Experimental results demonstrate that LVINet achieves a classification accuracy of 93.62% with reduced parameter count compared to existing CNN models. Furthermore, the proposed audio-visual feature fusion method improves classification accuracy by 7.02% and 4.33% when compared to using single audio or visual features alone, respectively.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference19 articles.

1. A Virtual Instrument for Road Vehicle Classification Based on Piezoelectric Transducers;Gonzalez;Sensors,2020

2. Real-Time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation;Wang;Remote Sens,2022

3. Deep Reinforcement Learning With Visual Attention for Vehicle Classification;Zhao;IEEE Trans. Cogn. Dev. Syst,2017

4. Hybridizing Extreme Learning Machines and Genetic Algorithms to select acoustic features in vehicle classification applications;Alexandre;Neurocomputing,2015

5. Research on Data Fusion Method Based on Multisource Data Awareness of Internet of Things;Sun;J. Sens.,2022

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion;Mathematics;2024-05-13