Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks

Author:

Valan Miroslav123,Makonyi Karoly14,Maki Atsuto5,Vondráček Dominik67,Ronquist Fredrik2

Affiliation:

1. Savantic AB, Rosenlundsgatan 52, 118 63 Stockholm, Sweden

2. Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Frescativagen 40, 114 18 Stockholm, Sweden

3. Department of Zoology, Stockholm University, Universitetsvagen 10, 114 18 Stockholm, Sweden

4. Disciplinary Domain of Science and Technology, Physics, Department of Physics and Astronomy, Nuclear Physics, Uppsala University, 751 20 Uppsala, Sweden

5. School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, SE-10044 Sweden

6. Department of Zoology, Faculty of Science, Charles University in Prague, Viničná 7, CZ-128 43 Praha 2, Czech Republic

7. Department of Entomology, National Museum, Cirkusová 1740, CZ-193 00 Praha 9 - Horní Počernice, Czech Republic

Abstract

Abstract Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached $CDATA[$CDATA[$>$$92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and $CDATA[$CDATA[$>$$96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.

Funder

European Union’s Horizon 2020

Ministry of Culture of the Czech Republic

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

Reference104 articles.

1. TensorFlow: large-scale machine learning on heterogeneous distributed systems;Abadi,2016

2. Biodiversity informatics in action: identification and monitoring of bee species using ABIS;Arbuckle,2001

3. Species identification by experts and non-experts: comparing images from field guides;Austen;Sci. Rep.,2016

4. Factors of transferability for a generic ConvNet representation;Azizpour;IEEE Trans. Pattern Anal. Mach. Intell.,2016

Cited by 109 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3