Unsupervised Learning-based Approach for Contextual Understanding of Web Material Around a New Domain of Algorithmic Government

Author:

Gupta Rajan1ORCID,Pal Saibal K.2ORCID

Affiliation:

1. Research & Analytics Division, Analyttica Datalab, India

2. SAG Lab, Defense Research & Development Organization, India

Abstract

Contextual understanding is a key aspect for learning a new domain through web search more effectively for making informed decisions. And with advent of machine learning approaches, it becomes even more fast and robust that enable collaboration between machine algorithms and humans. However, human expertise still holds the key for new domain, which has been proposed in this study as a key step in unsupervised learning approach of k-means clustering technique. Domain search term and context terms for the new domain are added to the clustering technique, and the relevance of the resultant groups has been tested. Context setting helps to analyse and understand the content of documents and other sources of information. For a new domain like Algorithmic Government, which does not have many documents on the web, it was found that contextual learning was up to 40% more relevant than the normal learning approach. The qualitative aspect of the clusters was found much better by the experts than quantitative aspect due to availability of lesser number of search documents. It was found that scientific research also supports the groups formed during contextual learning approach. This approach should help government to better understand and respond to the needs and concerns of their citizens by deriving better data insights in quick time, and to make more informed, evidence-based decisions, and sensitive to the needs and values of different communities and stakeholders. And thus, many stakeholders in the new domain can use this approach for exploration, research, policy formulation, strategizing, implementing and testing the various learnt concepts. A total of 15 search engines were used in the experimental settings with thousands of web crawling being done using Carrot 2 engine. Text embedding was done using bag-of-word technique and k-means clustering was implemented for producing 25 clusters across the two types of learnings.

Publisher

Association for Computing Machinery (ACM)

Subject

Public Administration,Software,Information Systems,Computer Science Applications,Computer Networks and Communications

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3