1. Large language models for software engineering: A systematic literature review;Hou;arXiv preprint arXiv:2308.10620,2023
2. The stack: 3 tb of permissively licensed source code;Kocetkov;arXiv preprint arXiv:2211.15533,2022
3. Starcoder 2 and the stack v2: The next generation;Lozhkov,2024
4. Llm-assisted code cleaning for training accurate code generators;Jain;arXiv preprint arXiv:2311.14904,2023
5. Tinystories: How small can language models be and still speak coherent english?;Eldan;arXiv preprint arXiv:2305.07759,2023