1. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;arXiv preprint arXiv 1810 04805,2018
2. Attention is all you need;vaswani;Advances in neural information processing systems,2017
3. ImageNet: A large-scale hierarchical image database
4. Deep Residual Learning for Image Recognition
5. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics;kendall;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018