Abstract
AbstractScientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the needs of expert knowledge from a wide range of areas in any field. Here we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect major research paradigm shifts and recent radiation of new topics, as well as turnovers of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.Significance statementChanges in scientific paradigms are foundational for the advancement of science, but such changes are difficult to summarize, quantify, and illustrate. These challenges are exacerbated by the rapid, exponential growth of literature. Applying a combination of machine learning and language modeling to hundreds of thousands of published abstracts, we demonstrate that a scientific field (i.e., plant science) can be summarized as interconnected subfields evolving from one another. We also reveal insights into major research trends and the rise and decline in the use of model organisms in different countries. Our study demonstrates how artificial intelligence and language models can be broadly applied to understand scientific advances that inform science policy and funding decisions.
Publisher
Cold Spring Harbor Laboratory