Abstract
AbstractThe use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning as well as a strong biological and ecotoxicological background, which we consider a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splittings were used. Therefore, we provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Reference75 articles.
1. EC – European Commission. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC (2006).
2. Mittal, K. et al. Resource requirements for ecotoxicity testing: A comparison of traditional and new approach methods. Preprint, Pharmacology and Toxicology https://doi.org/10.1101/2022.02.24.481630 (2022).
3. Wang, Z., Walker, G. W., Muir, D. C. G. & Nagatani-Yoshida, K. Toward a Global Understanding of Chemical Pollution: A First Comprehensive Analysis of National and Regional Chemical Inventories. Environmental Science & Technology 54, 2575–2584, https://doi.org/10.1021/acs.est.9b06379 (2020).
4. Muratov, E. N. et al. QSAR without borders. Chemical Society Reviews 49, 3525–3564, https://doi.org/10.1039/D0CS00098A (2020).
5. Wu, J. et al. Predicting chemical hazard across taxa through machine learning. Environment International 163, 107184, https://doi.org/10.1016/j.envint.2022.107184 (2022).
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献