1. LXMERT: Learning Cross-Modality Encoder Representations from Transformers
2. Learning AND-OR Templates for Object Recognition and Detection
3. Faster r-cnn: Towards real-time object detection with region proposal networks;ren;Adv Neural Inform Process Syst,2015
4. Learning human-object interactions by graph parsing neural networks;qi;ECCV,2018
5. Detecting Unseen Visual Relations Using Analogies