1. Flashattention: Fast and memory-efficient exact attention with io-awareness;Dao,2022
2. Nvidia dgx a100 datasheet,2023
3. Amazon ec2 p4 instances,2023
4. Gpuaccelerated viterbi exact lattice decoder for batched online and offline speech recognition;Braun;CoRR,2019
5. Decoding with Finite-State Transducers on GPUs