1. [n. d.]. CUDA C++ Programming Guide. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf [n. d.]. CUDA C++ Programming Guide. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
2. An Updated Set of Basic Linear Algebra Subprograms (BLAS);ACM Trans. Math. Softw.,2002
3. The input/output complexity of sorting and related problems
4. Berkin Akin , Franz Franchetti , and James C. Hoe . 2014 . FFTS with near-optimal memory access through block data layouts . In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3898–3902 . https://doi.org/10.1109/ICASSP.2014.6854332 10.1109/ICASSP.2014.6854332 Berkin Akin, Franz Franchetti, and James C. Hoe. 2014. FFTS with near-optimal memory access through block data layouts. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3898–3902. https://doi.org/10.1109/ICASSP.2014.6854332
5. ANDES Technology . 2020. AndesCore NX27V Processor . http://https://www.andestech.com/en/products-solutions/andescore-processors/riscv-nx27v//, Last accessed on 2021-11-03. ANDES Technology. 2020. AndesCore NX27V Processor. http://https://www.andestech.com/en/products-solutions/andescore-processors/riscv-nx27v//, Last accessed on 2021-11-03.