Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets-Reference-Cited by-同舟云学术

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

Published:2020-04-27 Issue:3 Volume:18 Page:507-527
ISSN:1570-7873
Container-title:Journal of Grid Computing
language:en
Short-container-title:J Grid Computing

Author:

Ghorbani M.,Swift S.,Taylor S. J. E.,Payne A. M.

Abstract

AbstractThe generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.

Funder

Brunel University

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Information Systems,Software

Link

https://link.springer.com/content/pdf/10.1007/s10723-020-09518-y.pdf

Reference46 articles.

1. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science. 349(6245), 255–260 (2015)

2. Q Zou, L Chen, T Huang, Z Zhang and Y Xu Machine Learning and Graph Analytics in Computational Biomedicine. Artificial Intelligence in Medicine 83, November, Page 1 and papers therein; (2017)

3. I.H. Witten, E. Frank, M.A. Hall and C.J. Pal, Data Mining: Practical machine learning tools and techniques. (Morgan Kaufmann 2016)

4. W. Cheng, G. Kasneci, T. Graepel, D. Stern and R. Herbrich Automated feature generation from structured knowledge. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 1395–1404). ACM. (2011)

5. H. Paulheim and J. Fümkranz June. Unsupervised generation of data mining features from linked open data. In Proceedings of the 2nd international conference on web intelligence, mining and semantics (p. 31). ACM. (2012)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimization of the Workflow in a BOINC-Based Desktop Grid for Virtual Drug Screening;Lecture Notes in Computer Science;2022