Parsers, Data Structures and Algorithms for Macromolecular Analysis Toolkit (MAT): Design and Implementation

Author:

Kalyan GazalORCID,Junghare Vivek,S S John,Chattopadhyay Anupam,Mitra Pralay,Hazra Saugata

Abstract

AbstractThe structural information of biological macromolecules are stored in .pdb, .mm-cif and lately mmtf files and thus it requires accurate and efficient biological tools for various utilities. Here, we describe Macromolecular Analysis Toolkit (MAT) that parses .pdb, .mmcif and .mmtf files; and builds data structures from the input. This original program is written in C++ programming language to ensure efficiency and consistency to organize structural information in an integral way. The novelty of the program lies in the addition of new structure-based biological algorithms and applications. This package also stands out from other similar libraries by being 1) faster and 2) accurate. We also provide detailed comparison of available parsers on the whole PDB database. The parser of MAT is designed in such a way that it allows quick extraction and organized loading of the core data structure. The same data structure is extended to accommodate information from the .mmcif and .mmtf file parsers. Tokenization of the data allows the extraction of information from disordered text, making it compatible for accurate identification of the entities present in the .pdb file. Additionally, we add a new approach of performance optimization by creating a few derived data structures, namely kD-Tree, Octree and graphs, for certain applications that need spatial coordinate calculations. MAT provides advanced data structure which is time efficient and is designed to avail reusability and consistency in a systematic framework. MAT parser can be accessed online through bitbucket at https://bitbucket.org/gazalk/pdb_parser/.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. The protein data bank: A computer-based archival file for macromolecular structures

2. BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods

3. Z. Honguy , J. Michael , M. Parag , C++ computational libraries for bioinformatics, version 0.3 (2006). URL http://biocpp.sourceforge.net/

4. R. Daniel , A simple c++ pdb reader (2004). URL http://graphics.stanford.edu/~drussel/pdb/index.html

5. Design and application of PDBlib, a C++ macromolecular class library

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3