Affiliation:
1. IBM Research. Bangalore, India
2. IBM Research. Almaden
Abstract
In this paper, we present ATHENA, an ontology-driven system for natural language querying of complex relational databases. Natural language interfaces to databases enable users easy access to data, without the need to learn a complex query language, such as SQL. ATHENA uses domain specific ontologies, which describe the semantic entities, and their relationships in a domain. We propose a unique two-stage approach, where the input natural language query (NLQ) is first translated into an intermediate query language over the ontology, called OQL, and subsequently translated into SQL. Our two-stage approach allows us to decouple the physical layout of the data in the relational store from the semantics of the query, providing physical independence. Moreover, ontologies provide richer semantic information, such as inheritance and membership relations, that are lost in a relational schema. By reasoning over the ontologies, our NLQ engine is able to accurately capture the user intent. We study the effectiveness of our approach using three different workloads on top of geographical (GEO), academic (MAS) and financial (FIN) data. ATHENA achieves 100% precision on the GEO and MAS workloads, and 99% precision on the FIN workload which operates on a complex financial ontology. Moreover, ATHENA attains 87.2%, 88.3%, and 88.9% recall on the GEO, MAS, and FIN workloads, respectively.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
100 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Intelligent Search Engine Tool for Querying Database Systems;International Journal of Mathematical, Engineering and Management Sciences;2024-08-01
2. MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain;Frontiers in Big Data;2024-06-26
3. Large Language Models: Principles and Practice;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
4. Metasql: A Generate-Then-Rank Framework for Natural Language to SQL Translation;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
5. PURPLE: Making a Large Language Model a Better SQL Writer;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13