Author:
Tolic Antonio,Boshkoska Biljana Mileva,Skansi Sandro
Abstract
Recurrent neural networks (RNNs), along with long short-term memory networks (LSTMs), have been successfully used on a wide range of sequential data problems and have been entitled as extraordinarily powerful tools for learning and processing such data. However, the search for a new or derived architecture that would model very long-term dependencies is still an active area of research. In this paper, a relatively psychologically plausible architecture named event buffering JANET (EB-JANET) is proposed. The architecture is derived from the forgetgate- only version of the LSTM, which is also called just another network (JANET). The new architecture implements a new working memory mechanism that operates on information represented as dynamic events. The event buffer, as a container of events, is a reference to the state of the relevant pre-activation values on the basis of which historical candidate values were generated relative to the current timestep. The buffer is emptied as needed and depending on the context of information. The proposed architecture has achieved world-class results and it outperforms JANET on multiple benchmark datasets. Moreover, the new architecture is applicable to a wider class of problems and showed superior resilience when processing longer sequences, as opposed to JANET which experienced catastrophic failures on certain tasks.
Publisher
Czech Technical University in Prague - Central Library