Author:
Sumit Shahriar Shakir,Watada Junzo,Roy Anurava,Rambli DRA
Abstract
Abstract
Deep learning concept and algorithm play a pivotal role in solving various complicated problems such as playing games, forecasting economic future values, detecting objects in images. It could break through the bottle neck in conventional methods of neural networks and artificial intelligence. This paper will compare two influential deep learning algorithms in image processing and object detection, that is, Mask R-CNN and YOLO. Today, detection tasks become more complex when they come to numerous variations in the humans’ perceived appearance, formation, attire, reasoning and the dynamic nature of their behaviour. It is also a challenging task to understand subtle details in their surroundings. For instance, radiance conditions, background clutter and partial or full occlusion. When a machine tries to interact with human or try to take pictures, it becomes hard for them to magnify the details of a human surrounding. In this study we have focused to detect humans effectively. The main objective of the present work is to compare the performance of YOLO and Mask R-CNN, which unveils the inability of Mask R-CNN in detecting tiny human figures among other prominent human images, and illustrate YOLO was successful in detecting most of the human figures in an image with higher accuracy. Therefore, the paper evaluates and differentiates the performance of YOLO from the deep learning method Mask R-CNN in two points, (1) detection ability and (2) computation time. Since, the machine learning algorithms are mostly data specific, the authors believe that the presented results might vary with the varying nature of the data under observation. In another way, the presented data might be seen as a counter example of unveiling the detection inaccuracy of the Mask R-CNN.
Subject
General Physics and Astronomy
Reference32 articles.
1. Mask R-CNN;He,2017
2. You only look once: Unified, real-time object detection;Redmon,2016
3. Fast and robust algorithm of tracking multiple moving objects for intelligent video surveillance systems;Kim;In IEEE Trans on Consumer Electronics
4. A survey of vision-based trajectory learning and analysis for surveillance;Morris;IEEE Trans. Circuits Syst. Video Technol.,2008
5. Human Semantic Parsing for Person Re-identification;Kalayeh,2018
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献