1. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;Conf N Am Chapter Assoc Comput Linguist Human Language Technol,0
2. Adam: A method for stochastic optimization;kingma;Proc Int Conf Learn Representations,0
3. Dropout: A simple way to prevent neural networks from overfitting;srivastava;J Mach Learn Res,2014
4. BERT post-training for review reading comprehension and aspect-based sentiment analysis;xu;Conf N Am Chapter Assoc Comput Linguist Human Language Technol,0
5. Enhanced aspect level sentiment classification with auxiliary memory;zhu;Proc 27th Int Conf Computat Linguistics,0