1. Attention is all you need;vaswani;Proc Adv Neural Inf Process Syst (NIPS),2017
2. On layer normalization in the transformer architecture;xiong;Proc Int Conf Mach Learn (ICML),2020
3. Dropout: A simple way to prevent neural networks from overfitting;srivastava;J Mach Learn Res,2014
4. Bridging nonlinearities and stochastic regularizers with Gaussian error linear units;hendrycks;Proc Int Conf Learn Represent (ICLR),2017
5. Over-the-Air Deep Learning Based Radio Signal Classification