1. Arrow. a cross-language development platform for in-memory data (2023). https://arrow.apache.org
2. Easyocr. https://github.com/JaidedAI/EasyOCR (2023)
3. Iso 639–3 code set. sil.org (2023)
4. Udhr in unicode. https://unicode.org/udhr/ (2023)
5. Abadji, J., Ortiz Suarez, P., Romary, L., Sagot, B.: Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. arXiv e-prints arXiv:2201.06642 (Jan 2022)