Extended information inference model for unsupervised categorization of web short texts

Author:

Xu Tao1,Peng Qinke1

Affiliation:

1. Systems Engineering Institute, Xi’an Jiaotong University, China

Abstract

Traditional text-processing methods encounter significant performance degradation when they are applied to web short texts, with their inherent characteristics including feature sparseness, lack of sufficient hand-labelled training examples, domain dependence, and asyntactic expression. In this paper we propose a modified information inference model that can mimic human cognitive behaviour to categorize various web short texts in an unsupervised manner. The model is based on the conceptual space theory and hyperspace analogue to language (HAL) model, and it is a novel development in that it combines domain-specific knowledge and universal knowledge via a fusion mechanism for multiple HAL spaces. Moreover, in the realization of conceptual space, a concept is represented geometrically by a two-tuple of property sets, which can effectively improve the representation accuracy of the information contained in combined concepts. Two measurements of the relationship between concepts are used to implement the information inference for web short texts. The experimental evaluation of our model is conducted via three different tasks on web short text categorization, and the results indicate the applicability and usefulness of the proposed method.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Reference40 articles.

1. Sentence similarity based on semantic nets and corpus statistics

2. Pinto D. On clustering and evaluation of narrow domain short-text corpora. PhD dissertation, Universidad Politécnica de Valencia, Spain, 2008.

3. Short text clustering by finding core terms

4. Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Short text classification using semantically enriched topic model;Journal of Information Science;2024-03-20

2. Summarization of Medical Document using Pointwise mutual information (PMI)-based for web document summarization;2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO);2022-10-13

3. Engineering Web resource summaries using Pointwise mutual information (PMI)-based for web document summarization;2022 International Conference on Innovations in Science and Technology for Sustainable Development (ICISTSD);2022-08-25

4. Information Intelligent Acquisition Generated by Matrix Reasoning of Inverse P-Set;Simulation Tools and Techniques;2021

5. ENHANCEMENT OF TEXT BASED EMOTION RECOGNITION PERFORMANCES USING WORD CLUSTERS;International Journal of Research -GRANTHAALAYAH;2019-01-31

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3