An Information Theoretic Approach to Symbolic Learning in Synthetic Languages-Reference-Cited by-同舟云学术

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

Published:2022-02-10 Issue:2 Volume:24 Page:259
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Back Andrew D.^ORCID,Wiles Janet^ORCID

Abstract

An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.

Funder

Trusted Autonomous Systems Defence Cooperative Research Centre

University of Queensland

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/2/259/pdf

Reference81 articles.

1. Infinitely productive language can arise from chance under communicative pressure

2. Determining the Number of Samples Required to Estimate Entropy in Natural Sequences

3. Transitive Entropy—A Rank Ordered Approach for Natural Sequences

4. A Mathematical Theory of Communication

5. A Mathematical Theory of Communication

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Estimation of Statistical Manifold Properties of Natural Sequences using Information Topology;2023 IEEE Statistical Signal Processing Workshop (SSP);2023-07-02

2. Estimating Sentence-like Structure in Synthetic Languages Using Information Topology;Entropy;2022-06-22