Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation-Reference-Cited by-同舟云学术

Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

Published:2023-11-10 Issue: Volume: Page:
ISSN:2833-0528
Container-title:ACM Journal on Autonomous Transportation Systems
language:en
Short-container-title:ACM J. Auton. Transport. Syst.

Author:

Cui Yiming¹,Han Cheng²,Liu Dongfang²

Affiliation:

1. University of Florida, USA

2. Rochester Institute of Technology, USA

Abstract

The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, video instance segmentation, which aims to track and segment multiple objects in video frames, has drawn much attention for its potential applications in various emerging areas such as autonomous driving, intelligent transportation, and smart retail. In this paper, we propose an effective framework for instance-level visual analysis on video frames, which can simultaneously conduct object detection, instance segmentation, and multi-object tracking. The core idea of our method is collaborative multi-task learning which is achieved by a novel structure, named associative connections among detection, segmentation, and tracking task heads in an end-to-end learnable CNN. These additional connections allow information propagation across multiple related tasks, so as to benefit these tasks simultaneously. We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets and obtain quite encouraging results.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3632181

Reference107 articles.

1. PoseTrack: A Benchmark for Human Pose Estimation and Tracking

2. Ali Athar Sabarinath Mahadevan Aljoša Ošep Laura Leal-Taixé and Bastian Leibe. 2020. STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos. arXiv preprint arXiv:2003.08429(2020). Ali Athar Sabarinath Mahadevan Aljoša Ošep Laura Leal-Taixé and Bastian Leibe. 2020. STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos. arXiv preprint arXiv:2003.08429(2020).

3. Gedas Bertasius and Lorenzo Torresani. 2020. Classifying Segmenting and Tracking Object Instances in Video with Mask Propagation. In CVPR. IEEE Virtual 9739–9748. Gedas Bertasius and Lorenzo Torresani. 2020. Classifying Segmenting and Tracking Object Instances in Video with Mask Propagation. In CVPR. IEEE Virtual 9739–9748.

4. Alexey Bochkovskiy Chien-Yao Wang and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934(2020). Alexey Bochkovskiy Chien-Yao Wang and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934(2020).

5. Daniel Bolya , Chong Zhou , Fanyi Xiao , and Yong Jae Lee . 2019 . YOLACT: Real-time Instance Segmentation . In ICCV. IEEE , Seoul, Korea , 9157–9166. Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. YOLACT: Real-time Instance Segmentation. In ICCV. IEEE, Seoul, Korea, 9157–9166.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A framework for robotic grasping of 3D objects in a tabletop environment;Multimedia Tools and Applications;2024-09-04

2. Stepwise Spatial Global-local Aggregation Networks for Autonomous Driving;ACM Journal on Autonomous Transportation Systems;2024-06-20