1. Déjean, H. (1998). Morphemes as necessary concept for structures discovery from untagged corpora. In Proceedings of the Joint Conference on New Methods in Language Processing and Computational Natural Language Learning (pp. 295–298). Macquarie University.
2. Gage, P. (1994). A new algorithm for data compression. The C User Journal, 12(2), 23–38.
3. Google (2019). WordPieceTokenizer in BERT. Retrieved February 05, 2023, from https://github.com/google-research/bert/blob/master/tokenization.py#L300-L359
4. Karpathy, A. (2022). mingpt. https://github.com/karpathy/minGPT/blob/master/mingpt/bpe.py
5. Kudo, T. (2017). SentencePiece. Retrieved on June 02, 2023, from https://github.com/google/sentencepiece