1. Our tracking framework's Pseudo-Frames fusion module resolves such issues, as highlighted in Table 3. When comparing our OSTrack384 and ODTrack-B with their RGB-only versions, it is evident that our method consistently achieves higher performance in terms of AUC, P, and P Norm . Specifically, on the LaSOT dataset, our OSTrack384 and ODTrack-B exhibited notable improvements, with a 2.9% and 2% increase in AUC, respectively, compared to their RGB-only counterparts. Similar enhancements were observed on the LaSOTExtSub dataset, where our OSTrack384 and ODTrack-B demonstrated a performance boost of 1.8% and 0.2%, respectively, in AUC;RGB frames alone
2. For example, there was an improvement of 0.6% and 0.8% for OSTack384 and ODTrack-B, respectively, on SR 0.75 . For the D-PTUAC dataset, our approach exhibited substantial improvements;Odtrack-B);bigger improvements were observed in terms of
3. Our findings in Table 3 show that language supervision can be fused with RGB and Events modalities to provide even stronger results. For instance, when comparing our JointNLT and CiteTracker with their RGB-only versions, it is evident that our method consistently achieves higher performance in terms of AUC, P, and P Norm . Specifically, on the LaSOT dataset, our JointNLT and CiteTracker exhibited notable improvements, with a 6.9% and 3.7% increase in AUC, respectively, compared to their RGB-only counterparts. Similar enhancements were observed on the LaSOTExtSub dataset, where our JointNLT and CiteTracker demonstrated a performance boost of 3.5% and 3.6%, respectively, in AUC;On the TrackingNet dataset