A tracking and summarization system for online Chinese news topics-Reference-Cited by-同舟云学术

A tracking and summarization system for online Chinese news topics

Published:2015-11-16 Issue:6 Volume:67 Page:687-699
ISSN:2050-3806
Container-title:Aslib Journal of Information Management
language:en
Short-container-title:

Author:

Chang Hsien-Tsung,Liu Shu-Wei,Mishra Nilamadhab

Abstract

Purpose – The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen. Design/methodology/approach – This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure. Findings – The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process. Research limitations/implications – The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google. Originality/value – The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference25 articles.

1. Brants, T. , Chen, F. and Farahat, A. (2003), “A system for new event detection”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 330-337.

2. Buitelaar, P. , Cimiano, P. and Magnini, B. (2005), Ontology Learning from Text: Methods, Evaluation and Applications , IOS Press, Amsterdam, Dutch.

3. Chen, H.H. , Kuo, J.J. , Huang, S.J. , Lin, C.J. and Wung, H.C. (2003), “A summarization system for Chinese news from multiple sources”, Journal of the American Society for Information Science and Technology , Vol. 54 No. 13, pp. 1224-1236.

4. Chen, K.-J. and Bai, M.-H. (1998), “Unknown word detection for Chinese by a corpus-based learning method”, International Journal of Computational Linguistics and Chinese Language Processing , Vol. 3 No. 1, pp. 27-44.

5. Chen, K.-J. and Liu, S.-H. (1992), “Word identification for Mandarin Chinese sentences”, Proceedings of the 14th Conference on Computational Linguistics-Volume 1, Association for Computational Linguistics, pp. 101-107.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mobile Cloud Computing Framework for an Android-Based Metaverse Ecosystem Platform;Advances in Computational Intelligence and Robotics;2023-06-30

2. A Deep Investigation on News Aggregation and Recommendation System: NARS;2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT);2022-12-26

3. Review on Knowledge-Centric Healthcare Data Analysis Case Using Deep Neural Network for Medical Data Warehousing Application;Digital Twins and Healthcare;2022-11-25

4. A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language;IEICE Transactions on Information and Systems;2022-04-01

5. Automatic content curation of news events;Multimedia Tools and Applications;2022-02-16