A Hybrid Model Combining Depthwise Separable Convolutions and Vision Transformers for Traffic Sign Classification Under Challenging Weather Conditions.

Author:

Parse Milind Vijay1ORCID,Pramod Dhanya2,Kumar Deepak3

Affiliation:

1. Symbiosis International University: Symbiosis International (Deemed University)

2. Symbiosis International (Deemed University)

3. Amity University Greater Noida

Abstract

Abstract

This research presents a novel deep-learning framework designed for traffic sign image classification under adverse conditions, including rain, shadows, haze, codec errors, and dirty lenses. To effectively balance accuracy and training parameters, the approach combines depthwise and pointwise convolutions, often referred to as depthwise separable convolutions, with a Vision Transformer (ViT) for subsequent feature extraction. The framework's initial block comprises two pairs of depthwise and pointwise convolutional layers followed by a normalization layer. Depthwise convolution is responsible for processing each input channel independently and applying separate filters to each channel, thereby reducing computational cost and parameters while maintaining spatial structure. Pointwise convolutional layers combine information from different channels, fostering complex feature interactions and non-linearities. Batch normalization is used for training stability. At the end of the initial block, the max pooling layer is used to enhance and downsample spatial dimensions. The architecture repeats four times, preserving crucial information through skip connections. To extract global context information, inter-block skip connections and global average pooling (GAP) are employed for dimensionality reduction while retaining vital information. Integration of the ViT model in the final layers captures far-reaching dependencies and relations in the feature maps. The framework concludes with two fully connected layers, a bottleneck layer with 1024 neurons and a second layer using softmax activation to generate a probability distribution over 14 classes. The proposed framework, combining convolution blocks and skip connections with precisely tuned ViT hyperparameters, enhances model performance and achieves an exceptional validation accuracy of 99.3%.

Publisher

Research Square Platform LLC

Reference24 articles.

1. Temel D, Kwon G, Prabhushankar M, AlRegib G (2017) CURE-TSR: Challenging unreal and real environments for traffic sign recognition. arXiv preprint arXiv:1712.02463

2. Kamal U, Das S, Abrar A, Hasan MK (2017) Traffic-sign detection and classification under challenging conditions: a deep neural network-based approach. IEEE video and image processing cup

3. Recent Advances in Traffic Sign Recognition: Approaches and Datasets;Lim XR;Sensors,2023

4. Katoch A (2022) Potential of Vision Transformers for Advanced Driver-Assistance Systems: An Evaluative Approach (Doctoral dissertation, The University of Western Ontario (Canada))

5. Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review;Maurício J;Appl Sci,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3