Machine learning and big scientific data

Author:

Hey Tony1ORCID,Butler Keith1,Jackson Sam1,Thiyagalingam Jeyarajan1

Affiliation:

1. Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot OX11 0QX, UK

Abstract

This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such ‘Big Scientific Data’ comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our ‘scientific machine learning’ benchmark suite and of the research challenges these benchmarks will enable. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Funder

Engineering and Physical Sciences Research Council

Publisher

The Royal Society

Subject

General Physics and Astronomy,General Engineering,General Mathematics

Reference87 articles.

1. The Deep Learning Revolution

2. Deng J Dong W Socher R Li L-J Li K Fei-Fei L. 2009 ImageNet: a large-scale hierarchical image database. In IEEE Conf. Computer Vision and Pattern Recognition Miami FL 20–25 June 2009. IEEE. See http://dx.doi.org/10.1109/CVPR.2009.5206848

3. ImageNet classification with deep convolutional neural networks

4. He K Zhang X Ren S Sun J. 2016 Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition Las Vegas NV 26 June–1 July 2016 pp. 770–778. IEEE. (doi:10.1109/CVPR.2016.90)

5. Hinton G. Quoted by Lukas Masuch Dec 2015. https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction (accessed 26 August 2019).

Cited by 52 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3