1. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter;sanh,2020
2. Huggingface’s transformers: State-of-the-art natural language processing;wolf,2020
3. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints
4. Aggregated momentum: Stability through passive damping;lucas,2018
5. GLUE: A multi-task bench-mark and analysis platform for natural language understanding;wang;Proceedings of ICLR,2019