Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks

Author:

Yang Bing1,Zhang Zhenxin234,Yang Cai-Qing1,Wang Ying1,Orr Michael C5,Wang Hongbin6,Zhang Ai-Bing1

Affiliation:

1. College of Life Sciences, Capital Normal University, Beijing 100048, People’s Republic of China

2. The Key Laboratory of 3D Information Acquisition and Application, MOE, Capital Normal University, Beijing 100048, People’s Republic of China

3. Beijing Laboratory of Water Resources Security, Capital Normal University, Beijing 100048, People’s Republic of China

4. Base of the State Key Laboratory of Urban Environmental Process and Digital, Capital Normal University, Beijing 100048, People’s Republic of China

5. Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, People’s Republic of China

6. Museum of Forest Biodiversity, Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Beijing 100091, People’s Republic of China

Abstract

Abstract Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($>$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]

Funder

Natural Science Foundation of China

China National Funds for Distinguished Young Scientists

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3