Affiliation:
1. University of Cambridge
Abstract
Linguistic steganography is concerned with hiding information in natural language text. One of the major transformations used in linguistic steganography is synonym substitution. However, few existing studies have studied the practical application of this approach. In this article we propose two improvements to the use of synonym substitution for encoding hidden bits of information. First, we use the Google n-gram corpus for checking the applicability of a synonym in context, and we evaluate this method using data from the SemEval lexical substitution task and human annotated data. Second, we address the problem that arises from words with more than one sense, which creates a potential ambiguity in terms of which bits are represented by a particular word. We develop a novel method in which words are the vertices in a graph, synonyms are linked by edges, and the bits assigned to a word are determined by a vertex coding algorithm. This method ensures that each word represents a unique sequence of bits, without cutting out large numbers of synonyms, and thus maintains a reasonable embedding capacity.
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献