Abstract
AbstractIn the field of motor learning, few studies have addressed the case of non-instructed movement sequences learning, as they require long periods of training and data acquisition, and are complex to interpret. In contrast, such problems are readily addressed in machine learning, using artificial agents in simulated environments. To understand the mechanisms that drive the learning behavior of two macaque monkeys in a free-moving multi-target reaching task, we created two Reinforcement Learning (RL) models with different penalty criteria: “Time” reflecting the time spent to perfom a trial, and “Power” integrating the energy cost. The initial phase of the learning process is characterized by a rapid improvement in motor performance for both the 2 monkeys and the 2 models, with hand trajectories becoming shorter and smoother while the velocity gradually increases along trials and sessions. This improvement in motor performance with training is associated with a simplification in the trajectory of the movements performed to achieve the task goal. The monkeys and models show a convergent evolution towards an optimal circular motor path, almost exclusively in counter-clockwise direction, and a persistent inter-trial variability. All these elements contribute to interpreting monkeys learning in the terms of a progressive updating of action-selection patterns, following a classic value iteration scheme as in reinforcement learning. However, in contrast with our models, the monkeys also show a specific variability in thechoiceof the motor sequences to carry out across trials. This variability reflects a form of “path selection”, that is absent in the models. Furthermore, comparing models and behavioral data also reveal sub-optimality in the way monkeys manage the trade-off between optimizing movement duration (”Time”) and minimizing its metabolic cost (”Power”), with a tendency to overemphasize one criterion at the detriment of the other one. Overall, this study reveals the subtle interplay between cognitive factors, biomechanical constraints, task achievement and motor efficacy management in motor learning, and highlights the relevance of modeling approaches in revealing the respective contribution of the different elements at play.Author summaryThe way in which animals and humans learn new motor skills through free exploratory movements sequences solely governed by success or failure outcomes is not yet fully understood. Recent advances in machine learning techniques for continuous action spaces led us to construct a motor learning model investigate how animals progressively enhance the efficiency of their behaviors through numerous trials and errors. This study conducts a comprehensive comparison between deep learning models and experimental data from monkey behavior. Notably, we show that the progressive refinement of motor sequences, as they are observed in the animals, do not require the implementation of a complete model of their environment. Rather, it merely requires the capacity to anticipate both movement costs and final reward a few steps ahead in the future following a value iteration principle. Furthermore, the systematic deviations exhibited by the monkeys with respect to the computational model inform us on the presence of individual preferences in either minimizing the duration or the energy consumption, and also on the involvement of alternative “cognitive” strategies.
Publisher
Cold Spring Harbor Laboratory