Affiliation:
1. School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
2. Science and Technology on Electromechanical Dynamic Control Laboratory, Xi’an 710065, China
Abstract
Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially those based on deep learning. To overcome this limitation, we introduce an open-access benchmark dataset, which was termed the Military Objects in Real Combat (MOCO) dataset. It features real combat images captured from the perspective of low-altitude UAVs or UGVs, along with a comprehensive set of captions. Furthermore, we propose a novel encoder–augmentation–decoder image-captioning architecture with a map augmentation embedding (MAE) mechanism, MAE-MilitIC, which leverages both image and text modalities as a guiding prefix for caption generation and bridges the semantic gap between visual and textual data. The MAE mechanism maps both image and text embeddings onto a semantic subspace constructed by relevant military prompts, and augments the military semantics of the image embeddings with attribute-explicit text embeddings. Finally, we demonstrate through extensive experiments that MAE-MilitIC surpasses existing models in performance on two challenging datasets, which provides strong support for intelligence warfare based on military UAVs and UGVs.
Funder
National Natural Science Foundation of China
Reference65 articles.
1. Battlefield image situational awareness application based on deep learning;Peng;IEEE Intell. Syst.,2019
2. Monteiro, J., Kitamoto, A., and Martins, B. (2017, January 19–21). Situational awareness from social media photographs using automated image captioning. Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan.
3. Robertson, J. (2014, January 13–14). Integrity of a common operating picture in military situational awareness. Proceedings of the 2014 Information Security for South Africa, Johannesburg, South Africa.
4. Schwartz, P.J., O’Neill, D.V., Bentz, M.E., Brown, A., Doyle, B.S., Liepa, O.C., Lawrence, R., and Hull, R.D. (May, January 27). AI-enabled wargaming in the military decision making process. Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II. SPIE, Online.
5. Multimodal transformer with multi-view visual representation for image captioning;Yu;IEEE Trans. Circuits Syst. Video Technol.,2019