Affiliation:
1. University of Science and Technology of China
Abstract
When data mining involves document processing, extracting abstract information from the content of documents has become an essential procedure. The core idea of the algorithms of abstract extraction represented by Luhn is extracting the abstract merely from the sentences which contain frequent words in the essay. However, these algorithms fail to extract from the full text in a deeper semantic level. Therefore, the accuracy of the traditional abstract extraction algorithm needs to be enhanced. In order to improve the accuracy, we propose a method which can improve the performance of the algorithm of candidate key words extraction by using the substitute words and considering the semantic meanings of the candidate key words.
Publisher
Trans Tech Publications, Ltd.
Reference8 articles.
1. Hidalgo J M G. Evaluating cost-sensitive unsolicited bulk email categorization[C]. Proceedings of ACM Symposium on Applied Computing, 2002: 615-620.
2. Hobbs Jerry R. Information extraction from biomedical text[J]. Journal of Biomedical Informatics, 2002, 35(4): 260-264.
3. H P Luhn. The automatic creation of literature abstracts[J] . IBM Journal of Research Development, 1958, 2 ( 2): 159-165.
4. Girvan M, Newman M EJ. Community structure in social and biological networks[J]. Proceedings of National Academy of Sciences, 2002, 99(12): 7821-7826.
5. Pedersen T, Banerjee S, Patwardh an S. Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Supercomputing institute research report umsi 2005/ 25, University of Minnes ota, (2005).