Tokenization in the Theory of Knowledge


Friedman Robert1ORCID


1. Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA


Tokenization is a procedure for recovering the elements of interest in a sequence of data. This term is commonly used to describe an initial step in the processing of programming languages, and also for the preparation of input data in the case of artificial neural networks; however, it is a generalizable concept that applies to reducing a complex form to its basic elements, whether in the context of computer science or in natural processes. In this entry, the general concept of a token and its attributes are defined, along with its role in different contexts, such as deep learning methods. Included here are suggestions for further theoretical and empirical analysis of tokenization, particularly regarding its use in deep learning, as it is a rate-limiting step and a possible bottleneck when the results do not meet expectations.




General Earth and Planetary Sciences,General Environmental Science

Reference42 articles.

1. Wirth, N. (1996). Compiler Construction, Addison Wesley Longman Publishing, Co.

2. Connectionist learning procedures;Hinton;Artif. Intell.,1989

3. Deep learning in neural networks: An overview;Schmidhuber;Neural Netw.,2015

4. Collins, B., and Mees, I.M. (2002). A Phonetic Dictionary of the English Language, Routledge. Daniel Jones: Selected Works.

5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4–9). Attention Is All You Need. Proceedings of the 31st Conference on Neural Information Processing System, Long Beach, CA, USA.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3