1. ASR is All You Need: Cross-Modal Distillation for Lip Reading
2. Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
3. Yusuf Aytar , Carl Vondrick , and Antonio Torralba . 2016 . SoundNet: Learning Sound Representations from Unlabeled Video. In Advances in Neural Information Processing Systems (NeurIPS), D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R . Garnett (Eds.) , Vol. 29 . Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In Advances in Neural Information Processing Systems (NeurIPS), D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29.
4. Alexei Baevski , Yuhao Zhou , Abdelrahman Mohamed , and Michael Auli . 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems (NeurIPS) 33 ( 2020 ), 12449--12460. Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems (NeurIPS) 33 (2020), 12449--12460.
5. mmSpy: Spying Phone Calls using mmWave Radars