1. What Does BERT Look at? An Analysis of BERT’s Attention
2. How Does BERT Answer Questions?
3. On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines;mosbach;Proc 9th Int Conf Learn Represent (ICLR),2021
4. A survey of transformers;lin;arXiv 2106 04554,2021
5. On layer normalization in the transformer architecture;xiong;Proc Int Conf Mach Learn,2020