A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

Author:

Abulaish Muhammad1,Anwar Tarique2

Affiliation:

1. Center of Excellence in Information Assurance, King Saud University, Riyadh, Saudi Arabia & Department of Computer Science, Jamia Millia Islamia (A Central University), New Delhi, India

2. Center of Excellence in Information Assurance, King Saud University, Riyadh, Saudi Arabia

Abstract

Tag clouds have become an effective tool to quickly perceive the most prominent terms embedded within textual data. Tag clouds help grasp the main theme of a corpus without exploring the pile of documents. However, the effectiveness of tag clouds to conceptualize text corpora is directly proportional to the quality of the tags. In this paper, the authors propose a keyphrase-based tag cloud generation framework. In contrast to existing tag cloud generation systems that use single words as tags and their frequency counts to determine the font size of the tags, the proposed framework identifies feasible keyphrases and uses them as tags. The font-size of a keyphrase is determined as a function of its relevance weight. Instead of using partial or full parsing, which is inefficient for lengthy sentences and inaccurate for the sentences that do not follow proper grammatical structure, the proposed method applies n-gram techniques followed by various heuristics-based refinements to identify candidate phrases from text documents. A rich set of lexical and semantic features are identified to characterize the candidate phrases and determine their keyphraseness and relevance weights. The authors also propose a font-size determination function, which utilizes the relevance weights of the keyphrases to determine their relative font size for tag cloud visualization. The efficacy of the proposed framework is established through experimentation and its comparison with the existing state-of-the-art tag cloud generation methods.

Publisher

IGI Global

Subject

General Engineering

Reference41 articles.

1. Abulaish, M., & Anwar, T. (2011). A web content mining approach for tag cloud generation. In Proceedings 13th International Conference on IIWAS (pp. 52-59).

2. A supervised learning approach for automatic keyphrase extraction.;M.Abulaish;International Journal of Innovative Computing, Information, & Control,2012

3. Angel, A., Koudas, N., Sarkas, N., & Srivastava, D. (2007). What's on the grapevine? In Proceedings of the SIGMOD (pp. 1047-1050).

4. Aula, A., Jhaveri, N., & Kaki, M. (2005). Information search and re-access strategies of experienced web users. In Proceedings of the 14th International Conference on WWW (pp. 583–592)

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Layered Approach for Summarization and Context Learning from Microblogging Data;Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services;2018-11-19

2. A social graph based text mining framework for chat log investigation;Digital Investigation;2014-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3