Automated classification of web pages in hierarchical browsing

Author:

Golub Koraljka,Lykke Marianne

Abstract

PurposeThe purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme.Design/methodology/approachA user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes.FindingsThe study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness.Research limitations/implicationsFurther research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation.Practical implicationsImprovements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated.Originality/valueA user‐based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference34 articles.

1. Aitchison, T.M. and Harding, P. (1982), “Automatic indexing and classification for mechanized information retrieval”, Proceedings of EURIM 5, Versailles, 12‐14 May 1982.

2. Ardö, A. (2007), “Focused crawler: combine system homepage”, available at: http://combine.it.lth.se/ (accessed 3 September 2007).

3. Borlund, P. (2003), “The IIR evaluation model: a framework for evaluation of interactive information retrieval systems”, Information Research, Vol. 8 No. 3, available at: http://informationr.net/ir/8‐3/paper152.html (accessed 5 September 2007).

4. Custard, M. and Sumner, T. (2005), “Using machine learning to support quality judgments”, D‐Lib Magazine, Vol. 11 No. 10, October, available at: www.dlib.org/dlib/october05/custard/10custard.html (accessed 3 September 2007).

5. DESIRE (2000), “DESIRE: Development of a European Service for Information on Research and Education”, DESIRE, available at: www.desire.org/.

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Automated Subject Indexing: An Overview;Cataloging & Classification Quarterly;2021-11-17

2. Enhancing Navigability: An Algorithm for Constructing Tag Trees;Journal of Data and Information Science;2017-03-21

3. A framework for evaluating automatic indexing or classification in the context of retrieval;Journal of the Association for Information Science and Technology;2015-10-22

4. Augmenting Dublin Core digital library metadata with Dewey Decimal Classification;Journal of Documentation;2015-09-14

5. References;Library Classification Trends in the 21st Century;2012

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3