Data science competition for cross-site individual tree species identification from airborne remote sensing data

Author:

Graves Sarah J.1ORCID,Marconi Sergio2,Stewart Dylan3ORCID,Harmon Ira4,Weinstein Ben2,Kanazawa Yuzi5,Scholl Victoria M.67ORCID,Joseph Maxwell B.6ORCID,McGlinchy Joseph6,Browne Luke8ORCID,Sullivan Megan K.8ORCID,Estrada-Villegas Sergio8,Wang Daisy Zhe4,Singh Aditya9,Bohlman Stephanie10,Zare Alina31112ORCID,White Ethan P.21112ORCID

Affiliation:

1. Nelson Institute for Environmental Studies, University of Wisconsin-Madison, Madison, Wisconsin, United States

2. Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, United States

3. Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States

4. Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida, United States

5. Artificial Intelligence Laboratory, Fujitsu Laboratories Ltd., Kawasaki, Kanagawa, Japan

6. Earth Lab, Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado at Boulder, Boulder, Colorado, United States

7. Department of Geography, University of Colorado at Boulder, Boulder, Colorado, United States

8. Yale School of the Environment, Yale University, New Haven, Connecticut, United States

9. Department of Agricultural & Biological Engineering, University of Florida, Gainesville, Florida, United States

10. School of Forest, Fisheries, and Geomatics Sciences, University of Florida, Gainesville, Florida, United States

11. Informatics Institute, University of Florida, Gainesville, Florida, United States

12. Biodiversity Institute, University of Florida, Gainesville, Florida, United States

Abstract

Data on individual tree crowns from remote sensing have the potential to advance forest ecology by providing information about forest composition and structure with a continuous spatial coverage over large spatial extents. Classifying individual trees to their taxonomic species over large regions from remote sensing data is challenging. Methods to classify individual species are often accurate for common species, but perform poorly for less common species and when applied to new sites. We ran a data science competition to help identify effective methods for the task of classification of individual crowns to species identity. The competition included data from three sites to assess each methods’ ability to generalize patterns across two sites simultaneously and apply methods to an untrained site. Three different metrics were used to assess and compare model performance. Six teams participated, representing four countries and nine individuals. The highest performing method from a previous competition in 2017 was applied and used as a baseline to understand advancements and changes in successful methods. The best species classification method was based on a two-stage fully connected neural network that significantly outperformed the baseline random forest and gradient boosting ensemble methods. All methods generalized well by showing relatively strong performance on the trained sites (accuracy = 0.46–0.55, macro F1 = 0.09–0.32, cross entropy loss = 2.4–9.2), but generally failed to transfer effectively to the untrained site (accuracy = 0.07–0.32, macro F1 = 0.02–0.18, cross entropy loss = 2.8–16.3). Classification performance was influenced by the number of samples with species labels available for training, with most methods predicting common species at the training sites well (maximum F1 score of 0.86) relative to the uncommon species where none were predicted. Classification errors were most common between species in the same genus and different species that occur in the same habitat. Most methods performed better than the baseline in detecting if a species was not in the training data by predicting an untrained mixed-species class, especially in the untrained site. This work has highlighted that data science competitions can encourage advancement of methods, particularly by bringing in new people from outside the focal discipline, and by providing an open dataset and evaluation criteria from which participants can learn.

Funder

National Science Foundation

Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative

NSF Dimension of Biodiversity Program Grant

USDA/NIFA McIntire-Stennis Program

University of Florida Biodiversity Institute

Informatics Institute (UFII) Graduate Fellowship

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3