Affiliation:
1. Research & Analytics Division, Analyttica Datalab, India
2. SAG Lab, Defense Research & Development Organization, India
Abstract
Contextual understanding is a key aspect for learning a new domain through web search more effectively for making informed decisions. And with advent of machine learning approaches, it becomes even more fast and robust that enable collaboration between machine algorithms and humans. However, human expertise still holds the key for new domain, which has been proposed in this study as a key step in unsupervised learning approach of k-means clustering technique. Domain search term and context terms for the new domain are added to the clustering technique, and the relevance of the resultant groups has been tested. Context setting helps to analyse and understand the content of documents and other sources of information. For a new domain like Algorithmic Government, which does not have many documents on the web, it was found that contextual learning was up to 40% more relevant than the normal learning approach. The qualitative aspect of the clusters was found much better by the experts than quantitative aspect due to availability of lesser number of search documents. It was found that scientific research also supports the groups formed during contextual learning approach. This approach should help government to better understand and respond to the needs and concerns of their citizens by deriving better data insights in quick time, and to make more informed, evidence-based decisions, and sensitive to the needs and values of different communities and stakeholders. And thus, many stakeholders in the new domain can use this approach for exploration, research, policy formulation, strategizing, implementing and testing the various learnt concepts. A total of 15 search engines were used in the experimental settings with thousands of web crawling being done using Carrot
2
engine. Text embedding was done using bag-of-word technique and k-means clustering was implemented for producing 25 clusters across the two types of learnings.
Publisher
Association for Computing Machinery (ACM)
Subject
Public Administration,Software,Information Systems,Computer Science Applications,Computer Networks and Communications