Analysing and Examining Taxonomy and Folksonomy Terms in the Hybrid Subject Device using Machine Learning Techniques

Author:

Chatterjee SwarnaliORCID,Das RajeshORCID

Abstract

The information retrieval system contains either a list of subject terms (taxonomy) or a list of collaborative tags (folksonomy) or both. The taxonomy and folksonomy come together as called hybrid subject devices. The main purpose of this paper is to apply machine learning techniques in the dataset from the library domain like others and analyse a large quantity of data for critical problems with accuracy. This research reveals to perform EDA (Exploratory data analysis), prediction analysis, and similarity measurement between folksonomy and taxonomy terms with new emerging technologies. Data science deals with big data that means unstructured data, messy data, a large volume of data. The size is of a large amount of data in terms of GB, TB. Machine learning tools manage this type of data. Usually, the Excel, or other spreadsheets package could not manage the file size in GB or TB, and that’s why ML tools, and techniques are applied. At present, the library science domain also contains a large amount of data like 20/30 years of circulation data or subject descriptors, collaborative tags etc. Library professionals can apply machine learning tools for analysing this kind of data in the library domain. In this paper, the authors have introduced the applications of tools and techniques in the library domain and they have tested with 2642 taxonomy and folksonomy terms. This research work includes – EDA, prediction analysis, and similarity measurement of a folksonomy and taxonomy dataset. In the EDA part, the research work has performed a lot of analysis that includes frequency of LCSH (Library of Congress Subject Heading - taxonomy) terms, pair plots, joint plots, and heat map of LCSH and folksonomy terms. The logistic regression (LR) model for prediction analysis has been used in the folksonomy and taxonomy dataset. These 2642 terms of folksonomy and taxonomy both terms are taken as data for this research work. The EDA has been performed with the attributes in the dataset. The accuracy value of logistic regression (f1- score) is 0.37 at the training percentage of 69. The percentage of similarity between LCSH terms and folksonomy terms is 30 per cent (0.30151134), and the angle between these two vectors is 27 degrees. The novelty of this research work is that library data has been analysed using machine learning techniques the ever used before.

Publisher

Defence Scientific Information and Documentation Centre

Subject

Library and Information Sciences

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3