1. GPT-SW3: An Autoregressive Language Model for the Nordic Languages;Ekgren,2023
2. Text tiling: Segmenting text into multi-paragraph subtopic passages;Hearst;Computational linguistics,1997
3. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing;Kudo;CoRR,2018
4. Compilers: principles, techniques, and tools;Lam;Pearson Education,2006
5. Neural Machine Translation of Rare Words with Subword Units;Sennrich,2016