Author:
Chang Hsien-Tsung,Liu Shu-Wei,Mishra Nilamadhab
Abstract
Purpose
– The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen.
Design/methodology/approach
– This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure.
Findings
– The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process.
Research limitations/implications
– The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google.
Originality/value
– The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.
Subject
Library and Information Sciences,Information Systems
Reference25 articles.
1. Brants, T.
,
Chen, F.
and
Farahat, A.
(2003), “A system for new event detection”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 330-337.
2. Buitelaar, P.
,
Cimiano, P.
and
Magnini, B.
(2005),
Ontology Learning from Text: Methods, Evaluation and Applications
, IOS Press, Amsterdam, Dutch.
3. Chen, H.H.
,
Kuo, J.J.
,
Huang, S.J.
,
Lin, C.J.
and
Wung, H.C.
(2003), “A summarization system for Chinese news from multiple sources”,
Journal of the American Society for Information Science and Technology
, Vol. 54 No. 13, pp. 1224-1236.
4. Chen, K.-J.
and
Bai, M.-H.
(1998), “Unknown word detection for Chinese by a corpus-based learning method”,
International Journal of Computational Linguistics and Chinese Language Processing
, Vol. 3 No. 1, pp. 27-44.
5. Chen, K.-J.
and
Liu, S.-H.
(1992), “Word identification for Mandarin Chinese sentences”, Proceedings of the 14th Conference on Computational Linguistics-Volume 1, Association for Computational Linguistics, pp. 101-107.
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献