1. Abadji, J., Ortiz Suarez, P., Romary, L., & Sagot, B. (2022). Towards a cleaner document-oriented multilingual crawled corpus. arXiv e-prints, 2201–06642 arXiv:2201.06642 [cs.CL]
2. Alatrash, R., Schlechtweg, D., Kuhn, J., & Walde, S. (2020). CCOHA: Clean Corpus of Historical American English. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6958–6966. European Language Resources Association, Marseille, France. https://aclanthology.org/2020.lrec-1.859 Accessed 2023-09-23
3. Arquivo dos Açores. https://hdl.handle.net/21.11129/0000-000D-F8C0-2. Accessed: 16-5-2023
4. ARQUIVO PESSOA. http://arquivopessoa.net/. Accessed: 15-05-2023
5. As Memórias Paroquiais de 1758. http://www.cidehusdigital.uevora.pt/portugal1758. Accessed: 15-05-2023