A fine-tuned YOLOv5 deep learning approach for real-time house number detection-Reference-Cited by-同舟云学术

A fine-tuned YOLOv5 deep learning approach for real-time house number detection

Published:2023-07-03 Issue: Volume:9 Page:e1453
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Taşyürek Murat¹,Öztürk Celal²

Affiliation:

1. Department of Computer Engineering, Kayseri University, Kayseri, Turkey

2. Department of Computer Engineering, Erciyes University, Kayseri, Turkey

Abstract

Detection of small objects in natural scene images is a complicated problem due to the blur and depth found in the images. Detecting house numbers from the natural scene images in real-time is a computer vision problem. On the other hand, convolutional neural network (CNN) based deep learning methods have been widely used in object detection in recent years. In this study, firstly, a classical CNN-based approach is used to detect house numbers with locations from natural images in real-time. Faster R-CNN, MobileNet, YOLOv4, YOLOv5 and YOLOv7, among the commonly used CNN models, models were applied. However, satisfactory results could not be obtained due to the small size and variable depth of the door plate objects. A new approach using the fine-tuning technique is proposed to improve the performance of CNN-based deep learning models. Experimental evaluations were made on real data from Kayseri province. Classic Faster R-CNN, MobileNet, YOLOv4, YOLOv5 and YOLOv7 methods yield f1 scores of 0.763, 0.677, 0.880, 0.943 and 0.842, respectively. The proposed fine-tuned Faster R-CNN, MobileNet, YOLOv4, YOLOv5, and YOLOv7 approaches achieved f1 scores of 0.845, 0.775, 0.932, 0.972 and 0.889, respectively. Thanks to the proposed fine-tuned approach, the f1 score of all models has increased. Regarding the run time of the methods, classic Faster R-CNN detects 0.603 seconds, while fine-tuned Faster R-CNN detects 0.633 seconds. Classic MobileNet detects 0.046 seconds, while fine-tuned MobileNet detects 0.048 seconds. Classic YOLOv4 and fine-tuned YOLOv4 detect 0.235 and 0.240 seconds, respectively. Classic YOLOv5 and fine-tuned YOLOv5 detect 0.015 seconds, and classic YOLOv7 and fine-tuned YOLOv7 detect objects in 0.009 seconds. While the YOLOv7 model was the fastest running model with an average running time of 0.009 seconds, the proposed fine-tuned YOLOv5 approach achieved the highest performance with an f1 score of 0.972.

Funder

The Scientific Research Projects Coordination Unit of Kayseri University within the scope of project

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1453.pdf

Reference89 articles.

1. Human detection in aerial thermal images using faster R-CNN and SSD algorithms;Akshatha;Electronics,2022

2. Understanding of a convolutional neural network;Albawi,2017

3. A state-of-the-art survey on deep learning theory and architectures;Alom;Electronics,2019

4. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions;Alzubaidi;Journal of Big Data,2021

5. Fine-tuning deep learning models for pedestrian detection;Amisse;Boletim de CiÊNcias GeodÉSicas,2021