Floating-Point Embedding: Enhancing the Mathematical Comprehension of Large Language Models-Reference-Cited by-同舟云学术

Floating-Point Embedding: Enhancing the Mathematical Comprehension of Large Language Models

Published:2024-04-15 Issue:4 Volume:16 Page:478
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Jin Xiaoxiao¹^ORCID,Mao Chenyang¹,Yue Dengfeng¹,Leng Tuo¹^ORCID

Affiliation:

1. Department of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

Abstract

The processing and comprehension of numerical information in natural language represent pivotal focal points of scholarly inquiry. Across diverse applications spanning text analysis to information retrieval, the adept management and understanding of the numerical content within natural language are indispensable in achieving task success. Specialized encoding and embedding techniques tailored to numerical data offer an avenue toward improved performance in tasks involving masked prediction and numerical reasoning, inherently characterized by numerical values. Consequently, treating numbers in text merely as words is inadequate; their numerical semantics must be underscored. Recent years have witnessed the emergence of a range of specific encoding methodologies designed explicitly for numerical content, demonstrating promising outcomes. We observe similarities between the Transformer architecture and CPU architecture, with symmetry playing a crucial role. In light of this observation and drawing inspiration from computer system theory, we introduce a floating-point representation and devise a corresponding embedding module. The numerical representations correspond one-to-one with their semantic vector values, rendering both symmetric regarding intermediate transformation methods. Our proposed methodology facilitates the more comprehensive encoding and embedding of numerical information within a predefined precision range, thereby ensuring a distinctive encoding representation for each numerical entity. Rigorous testing on multiple encoder-only models and datasets yielded results that stand out in terms of competitiveness. In comparison to the default embedding methods employed by models, our approach achieved an improvement of approximately 3.8% in Top-1 accuracy and a reduction in perplexity of approximately 0.43. These outcomes affirm the efficacy of our proposed method. Furthermore, the enrichment of numerical semantics through a more comprehensive embedding contributes to the augmentation of the model’s capacity for semantic understanding.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2073-8994/16/4/478/pdf

Reference35 articles.

1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30.

2. Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2024, April 03). Improving Language Understanding by Generative Pre-Training. Available online: https://api.semanticscholar.org/CorpusID:49313245.

3. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL-HLT, Minneapolis, MN, USA.

4. Patil, R., and Gudivada, V. (2024). A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs). App. Sci., 14.

5. Cheng, J. (2024). Applications of Large Language Models in Pathology. Bioengineering, 11.