1. SemDeDup: Data-efficient learning at web-scale through semantic deduplication;Abbas;arXiv E-prints,2023
2. Natural language model for automatic identification of intimate partner violence reports from twitter;Al-Garadi;Array,2022
3. Alizadeh, M., Kubli, M., Samei, Z., Dehghani, S., Bermeo, J.D., Korobeynikova, M., & Gilardi, F. (2023). Open-source large language models outperform crowd workers and approach ChatGPT in text-annotation tasks. arXiv Preprint arXiv:2307.02179.
4. Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J.H., Shafey, L.E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Wu, Y. (2023). Palm 2 Technical report. arXiv Preprint arXiv:2305.10403.
5. On the dangers of stochastic parrots: can language models be too big?;Bender,2021