Publisher
Springer Nature Switzerland
Reference40 articles.
1. Abadji, J., Ortiz Suarez, P., Romary, L., Sagot, B.: Towards a Cleaner Document-Oriented Multilingual Crawled Corpus, January 2022. arXiv e-prints arXiv:2201.06642
2. Barbaresi, A.: Trafilatura: a web scraping library and command-line tool for text discovery and extraction. In: Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 122–131. Association for Computational Linguistics (2021), https://aclanthology.org/2021.acl-demo.15
3. BigScience Workshop et al.: Bloom: A 176b-parameter open-access multilingual language model (2023)
4. Black, S., et al.: Gpt-neox-20b: an open-source autoregressive language model (2022)
5. Brown, T.B., et al.: Language models are few-shot learners (2020)