Author:
Golikov Aleksei,Akimov Dmitrii,Romanovskii Maksim,Trashchenkov Sergei
Abstract
The article describes various ways to use generative pre-trained language models to build a corporate question-and-answer system. A significant limitation of the current generative pre-trained language models is the limit on the number of input tokens, which does not allow them to work "out of the box" with a large number of documents or with a large document. To overcome this limitation, the paper considers the indexing of documents with subsequent search query and response generation based on two of the most popular open source solutions at the moment – the Haystack and LlamaIndex frameworks. It has been shown that using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, requires the use of an average of several more tokens. The article used a comparative analysis to evaluate the effectiveness of using generative pre-trained language models in corporate question-and-answer systems using the Haystack and Llamaindex frameworks. The evaluation of the obtained results was carried out using the EM (exact match) metric. The main conclusions of the conducted research on the creation of question-answer systems using generative pre-trained language models are: 1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes. 2. Processing information using the Haystack framework with the best settings allows you to get somewhat more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings). 3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in the chunk. 4. On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework. 5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.
Subject
Colloid and Surface Chemistry,Physical and Theoretical Chemistry
Reference17 articles.
1. Simmons R. F., Klein S., McConlogue K. Indexing and dependency logic for answering English questions // American Documentation. – 1964. – T. 15. – №. 3. – S. 196-204.
2. Luo M. et al. Choose your qa model wisely: A systematic study of generative and extractive readers for question answering // arXiv preprint arXiv:2203.07522. – 2022.
3. Zhou C. et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt // arXiv preprint arXiv:2302.09419. – 2023.
4. Lewis P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks //Advances in Neural Information Processing Systems. – 2020. – T. 33. – S. 9459-9474.
5. Maslyukhin S. M. Dialogovaya sistema na osnove ustnykh razgovorov s dostupom k nestrukturirovannoi baze znanii // Nauchno-tekhnicheskii vestnik informatsionnykh tekhnologii, mekhaniki i optiki. – 2023. – T. 23. – №. 1. – S. 88-95.