1. Pytorch: An imperative style, high-performance deep learning library;paszke;Advances in Neural IInformation Processing Systems,2019
2. Torchaudio: Building blocks for audio and speech processing;yang,2021
3. A comparison of transformer, convolutional, and recurrent neural networks on phoneme recognition;kyuhong shim,2022
4. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy,0
5. rVAD: An unsupervised segment-based robust voice activity detection method