A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology-Reference-Cited by-同舟云学术

A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

Published:2008-03 Issue:1 Volume:2 Page:1-30
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Hashimoto Kosuke¹,Aoki-Kinoshita Kiyoko Flora¹,Ueda Nobuhisa¹,Kanehisa Minoru¹,Mamitsuka Hiroshi¹

Affiliation:

1. Institute for Chemical Research, Kyoto University, Japan

Abstract

Mining frequent patterns from large datasets is an important issue in data mining. Recently, complex and unstructured (or semi-structured) datasets have appeared as targets for major data mining applications, including text mining, web mining and bioinformatics. Our work focuses on labeled ordered trees, which are typically semi-structured datasets. In bioinformatics, carbohydrate sugar chains, or glycans, can be modeled as labeled ordered trees. Glycans are the third major class of biomolecules, having important roles in signaling and recognition. For mining labeled ordered trees, we propose a new probabilistic model and its efficient learning scheme which significantly improves the time and space complexity of an existing probabilistic model for labeled ordered trees. We evaluated the performance of the proposed model, comparing it with those of other probabilistic models, using synthetic as well as real datasets from glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results on real data from a variety of biological viewpoints, verifying known facts in glycobiology.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1342320.1342326

Reference33 articles.

1. Managing and analyzing carbohydrate data

2. KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains

3. Trainable grammars for speech recognition

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of a novel monosaccharide substitution matrix for improved comparison of glycan structures;Carbohydrate Research;2022-01

2. GlyNet: a multi-task neural network for predicting protein–glycan interactions;Chemical Science;2022

3. Analyzing Glycan-Binding Patterns with the ProfilePSTMM Tool;Methods in Molecular Biology;2015

4. Informatics for Glycobiology and Glycomics;Carbohydrate Recognition;2011-07-07

5. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures;BMC Bioinformatics;2011-02-15