Abstract
AbstractNovelty is a double-edged sword for agents and animals alike: they might benefit from untapped resources or face unexpected costs or dangers such as predation. The conventional exploration/exploitation tradeoff is thus coloured by risk-sensitivity. A wealth of experiments has shown how animals solve this dilemma, for example using intermittent approach. However, there are large individual differences in the nature of approach, and modeling has yet to elucidate how this might be based on animals’ differing prior expectations about reward and threat and degrees of risk aversion. To capture these factors, we built a Bayes adaptive Markov decision process model with three key components: an adaptive hazard function capturing potential predation, an intrinsic reward function providing the urge to explore, and a conditional value at risk (CVaR) objective, which is a contemporary measure of trait risk-sensitivity. We fit this model to a coarse-grain abstraction of the behaviour of 26 animals who freely explored a novel object in an open-field arena (Akiti et al.Neuron110, 2022). We show that the model captures both quantitative (frequency, duration of exploratory bouts) and qualitative (stereotyped tail-behind) features of behavior, including the substantial idiosyncrasies that were observed. We find that “brave” animals, though varied in their behavior, generally are more risk neutral, and enjoy a flexible hazard prior. They begin with cautious exploration, and quickly transition to confident approach to maximize exploration for reward. On the other hand, “timid” animals, characterized by risk aversion and high and inflexible hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive behavior that is often associated with psychiatric illnesses such as anxiety and depression. Explaining risk-sensitive exploration using factorized parameters of reinforcement learning models could aid in the understanding, diagnosis, and treatment of psychiatric abnormalities in humans and other animals.Author summaryAnimals face a dilemma when they encounter novel objects in their environment. Approaching and investigating an object could lead to reward in the form of food, play, etc. but it also exposes the animal to dangers such as predation. Experiments have shown that animals solve this exploration dilemma by using intermittent strategies (alternately approaching the object and then retreating to a safe location) that gradually increase their level of risk. We built an abstract model of these exploration strategies and fit the model to the behavior of 26 mice freely exploring a novel object in an arena. Our model accounts for the high-level physical and mental states of the mice, the actions the mice can take, and beliefs about the uncertain consequences of those actions. Our model provides a rational explanation for individual differences seen in experiments: individuals maximize their utility given different prior beliefs about the dangers and the rewards in the environment, and different tendencies to overestimate the probability of bad outcomes. Modeling individual differences in risk-sensitivity during exploration could aid in the understanding, diagnosis, and treatment of psychiatric diseases such as anxiety and depression in humans and animals.
Publisher
Cold Spring Harbor Laboratory