Affiliation:
1. University of Notre Dame, Notre Dame, USA
2. Purdue University, West Lafayette, USA
Abstract
Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ∼ 85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this paper shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.
Publisher
Association for Computing Machinery (ACM)
Reference98 articles.
1. Harnessing Evolution of Multi-Turn Conversations for Effective Answer Retrieval
2. James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. PLOW: A Collaborative Task Learning Agent. In Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 2 (Vancouver, British Columbia, Canada) (AAAI’07). AAAI Press, 1514–1519.
3. A robust system for natural spoken dialogue
4. Interaction Illustration Taxonomy: Classification of Styles and Techniques for Visually Representing Interaction Scenarios
5. Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases