Off-policy and on-policy reinforcement learning with the Tsetlin machine-Reference-Cited by-同舟云学术

Off-policy and on-policy reinforcement learning with the Tsetlin machine

Published:2023-02-03 Issue:8 Volume:53 Page:8596-8613
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Rahimi Gorji Saeed^ORCID,Granmo Ole-Christoffer^ORCID

Abstract

AbstractThe Tsetlin Machine is a recent supervised learning algorithm that has obtained competitive accuracy- and resource usage results across several benchmarks. It has been used for convolution, classification, and regression, producing interpretable rules in propositional logic. In this paper, we introduce the first framework for reinforcement learning based on the Tsetlin Machine. Our framework integrates the value iteration algorithm with the regression Tsetlin Machine as the value function approximator. To obtain accurate off-policy state-value estimation, we propose a modified Tsetlin Machine feedback mechanism that adapts to the dynamic nature of value iteration. In particular, we show that the Tsetlin Machine is able to unlearn and recover from the misleading experiences that often occur at the beginning of training. A key challenge that we address is mapping the intrinsically continuous nature of state-value learning to the propositional Tsetlin Machine architecture, leveraging probabilistic updates. While accurate off-policy, this mechanism learns significantly slower than neural networks on-policy. However, by introducing multi-step temporal-difference learning in combination with high-frequency propositional logic patterns, we are able to close the performance gap. Several gridworld instances document that our framework can outperform comparable neural network models, despite being based on simple one-level AND-rules in propositional logic. Finally, we propose how the class of models learnt by our Tsetlin Machine for the gridworld problem can be translated into a more understandable graph structure. The graph structure captures the state-value function approximation and the corresponding policy found by the Tsetlin Machine.

Funder

Norges Forskningsråd

University of Agder

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-022-04297-3.pdf

Reference29 articles.

1. Abeyrathna KD, Bhattarai B, Goodwin M, Gorji S, Granmo OC, Jiao L, Saha R, Yadav RK (2021) Massively parallel and asynchronous Tsetlin machine architecture supporting almost constant-time scaling. In: The thirty-eighth international conference on machine learning (ICML 2021). ICML

2. Abeyrathna KD, Granmo OC, Zhang X, Jiao L, Goodwin M (2019) The regression Tsetlin machine - a novel approach to interpretable non-linear regression. Phil Trans R Soc A, vol 378

3. Abeyrathna KD, Granmo OC, Goodwin M (2021) Extending the Tsetlin machine with Integer-Weighted clauses for increased interpretability. IEEE Access 9:8233–8248

4. Berge GT, Granmo OC, Tveit T, Goodwin M, Jiao L, Matheussen B (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. IEEE Access 7:115134–115146

5. Bhattarai B, Granmo OC, Jiao L (2022) Word-level human interpretable scoring mechanism for novel text detection using Tsetlin machines. Appl Intell:1–25

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring the Potential of Model-Free Reinforcement Learning using Tsetlin Machines;2023 International Symposium on the Tsetlin Machine (ISTM);2023-08-29

2. Contracting Tsetlin Machine with Absorbing Automata;2023 International Symposium on the Tsetlin Machine (ISTM);2023-08-29