Affiliation:
1. Department of Computer Science and Information Engineering, National Taitung University, Taitung 950309, Taiwan
2. Department of Information Science and Management Systems, National Taitung University, Taitung 950309, Taiwan
3. Interdisciplinary Bachelor’s Program, National Taitung University, Taitung 950309, Taiwan
Abstract
In ethnographic research, data collected through surveys, interviews, or questionnaires in the fields of sociology and anthropology often appear in diverse forms and languages. Building a powerful database system to store and process such data, as well as making good and efficient queries, is very challenging. This paper extensively investigates modern database technology to find out what the best technologies to store these varied and heterogeneous datasets are. The study examines several database categories: traditional relational databases, the NoSQL family of key-value databases, graph databases, document databases, object-oriented databases and vector databases, crucial for the latest artificial intelligence solutions. The research proves that when it comes to field data, the NoSQL lineup is the most appropriate, especially document and graph databases. Simplicity and flexibility found in document databases and advanced ability to deal with complex queries and rich data relationships attainable with graph databases make these two types of NoSQL databases the ideal choice if a large amount of data has to be processed. Advancements in vector databases that embed custom metadata offer new possibilities for detailed analysis and retrieval. However, converting contents into vector data remains challenging, especially in regions with unique oral traditions and languages. Constructing such databases is labor-intensive and requires domain experts to define metadata and relationships, posing a significant burden for research teams with extensive data collections. To this end, this paper proposes using Generative AI (GenAI) to help in the data-transformation process, a recommendation that is supported by testing where GenAI has proven itself a strong supplement to document and graph databases. It also discusses two methods of vector database support that are currently viable, although each has drawbacks and benefits.
Funder
National Science and Technology Council
Reference27 articles.
1. Using computers to analyze ethnographic field data: Theoretical and practical considerations;Dohan;Annu. Rev. Sociol.,1998
2. Chen, H., Vasardani, M., Winter, S., and Tomko, M. (2018). A graph database model for knowledge extracted from place descriptions. ISPRS Int. J. Geo-Inf., 7.
3. Where’s the database in digital ethnography? Exploring database ethnography for open data research;Burns;Qual. Res.,2020
4. Building quantitative cross-cultural databases from ethnographic records: Promise, problems and principles;Watts;Cross-Cult. Res.,2022
5. Database meets artificial intelligence: A survey;Zhou;IEEE Trans. Knowl. Data Eng.,2020