Abstract
This research tackles the challenge of achieving efficient and accurate real-time object detection on low-end devices, such as single-board computers and embedded hardware. Departing from the conventional choice of SSD MobileNetv2, the study opts for the YOLOv4 architecture known for its superior performance in both speed and accuracy. By training the model on a dataset consisting of images captured in densely populated public spaces by surveillance cameras, the focus lies on discerning between individuals and crowds, critical for surveillance and crowd management applications. The hardware setup involves deploying the trained model on a Raspberry Pi 3B, a widely accessible single-board computer, to demonstrate real-world feasibility. Beyond merely showcasing improved stability, precision, and accuracy, the research conducts an in-depth analysis encompassing architectural shifts, relevance of training data, inference speed, and resource utilization, aiming to develop machine learning models specifically tailored for low-resource platforms. This holistic approach seeks to strike a balance between computational affordability and real-time performance, ultimately contributing to advancements in surveillance systems for various practical applications.