Affiliation:
1. University of Washington, Seattle, WA
Abstract
Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures. This paper describes Nooks, a
reliability subsystem
that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the
vast majority
of driver-caused crashes with
little or no change
to existing driver and system code. To achieve this, Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery.To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. In a series of 2000 fault-injection tests, Nooks recovered automatically from 99% of the faults that caused Linux to crash.While Nooks was designed for drivers, our techniques generalize to other kernel extensions, as well. We demonstrate this by isolating a kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.
Publisher
Association for Computing Machinery (ACM)
Reference52 articles.
1. Apache Project. http://httpd.apache.orgApache HTTP server version 2.0 2000. Available at http://httpd.apache.org.]] Apache Project. http://httpd.apache.orgApache HTTP server version 2.0 2000. Available at http://httpd.apache.org.]]
2. Lightweight remote procedure call
3. Extensibility safety and performance in the SPIN operating system
4. Implementing remote procedure calls
5. D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O'Reilly Jan. 2001.]] D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O'Reilly Jan. 2001.]]
Cited by
57 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Sfitag: Efficient Software Fault Isolation with Memory Tagging for ARM Kernel Extensions;Proceedings of the ACM Asia Conference on Computer and Communications Security;2023-07-10
2. Rewind & Discard: Improving Software Resilience using Isolated Domains;2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2023-06
3. Comparative Study on Fuchsia and Linux Device Driver Architecture;Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing;2023-03-27
4. Graceful ECC-uncorrectable Error Handling in the Operating System Kernel;2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE);2022-10
5. FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection;2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2021-06