Affiliation:
1. Institut de Robòtica i Informàtica Industrial, CSIC-UPC
Abstract
Reinforcement Learning (RL) of trajectory data has been used in several fields, and it is of relevance in robot motion learning, in which sampled trajectories are run and their outcome is evaluated with a reward value. The responsibility on the performance of a task can be associated to the trajectory as a whole, or distributed throughout its points (timesteps). In this work, we present a novel method for attributing the responsibility of the rewards to each timestep separately by using Mutual Information (MI) to bias the model fitting of a trajectory.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. AI Based Optimization of Fault Tolerance in Multipath Routing for Medical Application;2023 4th International Conference on Smart Electronics and Communication (ICOSEC);2023-09-20