Affiliation:
1. Bucharest University of Economic Studies , Bucharest , Romania
Abstract
Abstract
The article studies the current text processing tools based on Artificial Intelligence. A literature review is done emphasizing the dynamic evolution of AI-powered text analytics, having as its central tool ChatGPT and its capabilities. The focus is centered on the techniques and methods that are using embeddings in order to improve large language models (LLMs).
In this paper is analyzed the current situation of the literature in terms of text processing using Retrieval-Augmented Generation and is highlighted the potential of this technology to enhance the interpretability and trust in applications critical, such as those related to education or business. AI has revolutionized natural language processing (NLP), which facilitated the machines to interpret and generate text efficiently and accurately. In addition, large language models with external knowledge bases have been developed. These are used to produce more accurate and contextually relevant text responses. This approach is called Retrieval-Augmented Generation (RAG is one of the most significant recent advancements in this field.
Based on our study, two use cases are implemented to show the applicability of our study: one related to education and one related to business IT-related documents. The methodology describes the techniques used. This includes retrieval-augmented generation and embedding stored using vector databases. Our custom models are evaluated by comparing them to the general ones, without embeddings, showing superior performance.
The article highlights remarkable progress in Retrieval-Augmented Generation (RAG), which is used for AI text processing with a focus on business and education fields. Further in this paper, many of the most significant highlights are presented, which include a scalable framework for AI applications, a new integration of Retrieval-Augmented Generation and embeddings, practical application demonstrations, bridging gaps in the analysis op AI text, significant development in AI performance and optimizing educational and business processes.
Reference17 articles.
1. Bonan Min, H. R. (2023). Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey. ACM Computing SurveysVolume 56(2) -https://doi.org/10.1145/3605943, pp. 1-40.
2. Harsh Trivedi, N. B. (2023). Questions, Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step. Retrieved 2024, from ArXiv - Computer Science > Computation and Language: https://arxiv.org/abs/2212.10509
3. Izzidien, A. (2022). Word vector embeddings hold social ontological relations capable of reflecting meaningful fairness assessments. AI & Society, https://doi.org/10.1007/s00146-021-01167-3, 299–318.
4. Jacob Devlin, M.-W. C. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics.
5. Jeffrey Pennington, R. S. (2024). GloVe: Global Vectors for Word Representation. Retrieved 2024, from NLP Standford Edu: https://nlp.stanford.edu/projects/glove