Affiliation:
1. Center on Frontiers of Computing Studies, Peking University, Beijing, P.R. China
2. Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
3. Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
Abstract
We study two basic problems regarding edit errors, document exchange and error correcting codes. Here, two parties try to exchange two strings with length roughly
n
and edit distance at most
k
, or one party tries to send a string of length
n
to another party through a channel that can introduce at most
k
edit errors. The goal is to use the least amount of communication or redundancy possible. Both problems have been extensively studied for decades, and in this article, we focus on deterministic document exchange protocols and binary codes for insertions and deletions (insdel codes). It is known that for small
k
(e.g.,
k
≤ n/4 ), in both problems the optimal communication or redundancy size is Θ (
k
log n/k). In particular, this implies the existence of binary codes that can correct ε fraction of edit errors with rate 1-Θ (ε log 1/ε )). However, known constructions are far from achieving these bounds.
In this article, we significantly improve previous results on both problems. For document exchange, we give an efficient deterministic protocol with communication complexity
O
(
k
log
2
n/k
. This significantly improves the previous best-known deterministic protocol, which has communication complexity
O
(
k
2
+
k
log
2
n
) [
4
]. For binary insdel codes, we obtain the following results:
(1)
An explicit binary insdel code with redundancy
O
(
k
log
2
n/k). In particular this implies an explicit family of binary insdel codes that can correct ε fraction of insertions and deletions with rate 1-O(ε log
2
(1/ε))=1-Õ(ε). This significantly improves the previous best-known result, which only achieves rate 1-Õ(√ ε) [
14
], [
15
], and is optimal up to a log (1/ε factor.
(2)
An explicit binary insdel code with redundancy
O
(
k
log
n
). This significantly improves the previous best-known result of Reference [
6
], which only works for constant
k
and has redundancy
O
(
k
2
log
k
log
n
); and that of Reference [
4
], which has redundancy
O
(
k
2
+
k
log
2
n
). Our code has optimal redundancy for
k
≤ n
1-α
, any constant 0< α < 1. This is the first explicit construction of binary insdel codes that has optimal redundancy for a wide range of error parameters
k
.
In obtaining our results, we introduce several new techniques. Most notably, we introduce the notion of
ε-self-matching hash functions
and
ε-synchronization hash functions
. We believe our techniques can have further applications in the literature.
Funder
NSF
59th Annual IEEE Symposium on Foundations of Computer Science
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Computationally Relaxed Locally Decodable Codes, Revisited;2023 IEEE International Symposium on Information Theory (ISIT);2023-06-25