RESTHT: relation-enhanced spatial–temporal hierarchical transformer for video captioning
Author:
Funder
Fundamental Research Funds for the Central Universities
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Link
https://link.springer.com/content/pdf/10.1007/s00371-024-03350-1.pdf
Reference46 articles.
1. Venugopalan, S., Rohrbach, M., Donahue, J., et al.: Sequence to sequence-video to text[C]. In: Proceedings of the IEEE international conference on computer vision. 2015: 4534–4542. https://doi.org/10.1109/iccv.2015.515
2. Pan, P., Xu, Z., Yang, Y., et al.: Hierarchical recurrent neural encoder for video representation with application to captioning[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1029–1038. https://doi.org/10.1109/cvpr.2016.117
3. Peng, Y., Wang, C., Pei, Y., et al.: Video captioning with global and local text attention[J]. Vis. Comput. 38(12), 4267–4278 (2022). https://doi.org/10.1007/s00371-021-02294-0
4. Hu, Y., Chen, Z., Zha, Z. J., et al.: Hierarchical global-local temporal modeling for video captioning[C]. In: Proceedings of the 27th ACM International Conference on Multimedia. 2019: 774–783. https://doi.org/10.1145/3343031.3351072
5. Yan, C., Tu, Y., Wang, X., et al.: STAT: Spatial-temporal attention mechanism for video captioning[J]. IEEE Trans. Multim. 22(1), 229–241 (2019). https://doi.org/10.1109/tmm.2019.2924576
1.学者识别学者识别
2.学术分析学术分析
3.人才评估人才评估
"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370
www.globalauthorid.com
TOP
Copyright © 2019-2024 北京同舟云网络信息技术有限公司 京公网安备11010802033243号 京ICP备18003416号-3