1. Learning representations by maximizing mutual information across views;Bachman P.;arXiv preprint,1906
2. Semi‐supervised sequence learning, in;Dai A.M.;Proc. of NIPS,2015
3. Attentive language models beyond a fixed‐length context, in;Dai Z.;Proc. of ACL,2019