Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection-Reference-Cited by-同舟云学术

Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection

Published:2015 Issue: Volume:2015 Page:1-10
ISSN:2314-6133
Container-title:BioMed Research International
language:en
Short-container-title:BioMed Research International

Author:

Chen Yifei¹,Sun Yuxing¹,Han Bing-Qing¹

Affiliation:

1. School of Technology, Nanjing Audit University, 86 W. Yushan Road, Nanjing 211815, China

Abstract

Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of theF1measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

http://downloads.hindawi.com/journals/bmri/2015/751646.pdf

Reference24 articles.

1. A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification

2. A review of machine learning approaches to Spam filtering

3. Author gender identification from text

4. A Web page classification system based on a genetic algorithm using tagged-terms as features

5. Large-Scale Bayesian Logistic Regression for Text Categorization

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: The case of gluten bibliome;Neurocomputing;2021-11