This technical overview by veteran real-time instructor David Kalinsky examines a number of design patterns used to architect high-availability embedded systems. Their design is based on a combination of redundant hardware components and software to manage fault detection and correction, to achieve "five-nines" (99.999%) or greater availability, equivalent to less than 1 second of downtime per day.
After a quick presentation of definitions relevant to high availability and fault management, basic hardware N-plexing and voting issues are discussed. This is followed by an in-depth discussion of software/system fault tolerance techniques appropriate for embedded systems, starting with the static method of N-version programming. A number of dynamic software fault tolerance techniques are then surveyed, including Checkpoint-Rollback, Process Pairs and Recovery Blocks. The discussion continues with a forward error recovery technique called Alternative Processing. Many real-world examples are presented.
View Entire Paper | Previous Page | White Papers Search
If you found this page useful, bookmark and share it on: