1. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lucˇic´, M., Schmid, C., 2021. Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846.
2. Neural machine translation by jointly learning to align and translate;Bahdanau;arXiv Prepr. arXiv,2014
3. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer;Bejnordi;JAMA,2017
4. Explicit metric-based multiconcept multi-instance learning with triplet and superbag;Chi;IEEE Trans. Neural Netw. Learn. Syst.,2021
5. Chu, X., Tian, Z., Zhang, B., Wang, X., Wei, X., Xia, H., Shen, C., 2021. Conditional positional encodings for vision transformers. arXiv preprint arXiv:2102.10882.