1. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling;Bai,2018
2. Language models are few-shot learners;Brown,2020
3. Anomaly detection: A survey;Chandola;ACM Computing Surveys (CSUR),2009
4. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American chapter of the association for computational linguistics.
5. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020