CrossTalk : Making Low-Latency Fault Tolerance Cheap by Exploiting Redundant Networks

Author:

Loveless Andrew1ORCID,Phan Linh Thi Xuan2ORCID,Erickson Lisa3ORCID,Dreslinski Ronald4ORCID,Kasikci Baris5ORCID

Affiliation:

1. NASA Johnson Space Center, USA

2. University of Pennsylvania & Roblox, USA

3. Georgia Institute of Technology, USA

4. University of Michigan, USA

5. University of Washington, USA

Abstract

Real-time embedded systems perform many important functions in the modern world. A standard way to tolerate faults in these systems is with Byzantine fault-tolerant (BFT) state machine replication (SMR), in which multiple replicas execute the same software and their outputs are compared by the actuators. Unfortunately, traditional BFT SMR protocols are slow , requiring replicas to exchange sensor data back and forth over multiple rounds in order to reach agreement before each execution. The state of the art in reducing the latency of BFT SMR is eager execution , in which replicas execute on data from different sensors simultaneously on different processor cores. However, this technique results in 3–5× higher computation overheads compared to traditional BFT SMR systems, significantly limiting schedulability. We present CrossTalk , a new BFT SMR protocol that leverages the prevalence of redundant switched networks in embedded systems to reduce latency without added computation. The key idea is to use specific algorithms to move messages between redundant network planes (which many systems already possess) as the messages travel from the sensors to the replicas. As a result, CrossTalk can ensure agreement automatically in the network, avoiding the need for any communication between replicas. Our evaluation shows that CrossTalk improves schedulability by 2.13–4.24× over the state of the art. Moreover, in a NASA simulation of a real spaceflight mission, CrossTalk tolerates more faults than the state of the art while using nearly 3× less processor time.

Funder

NSF Graduate Research Fellowship

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference82 articles.

1. 2008. Survivability of Systems. Technical Report AC 25.795-7. Federal Aviation Administration.

2. 2009. ARINC 664 P7: Aircraft data network part 7 avionics full-duplex switched ethernet network. ARINC.

3. 2010. Aitech’s New Customizable 3U CPCI Enclosure Combines Flexible Electronic Configurations with Rugged Reliable Operation. https://picmg.mil-embedded.com/news/aitechs-configurations-rugged-reliable-operation/

4. 2015. TTEthernet Product Overview. http://konaka.com.tr/pdf/AS6802_TTEthernet.pdf

5. 2015. TTTech to Provide ARINC 664 p7 Products for Mission System on UK AW101 Merlin Mk4/4a Helicopters. https://www.tttech.com/press/tttech-to-provide-arinc-664-p7-products-for-mission-system-on-uk-aw101-merlin-mk4-4a-helicopters/

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3