In a distributed real-time system, tolerance to faults on processing nodes in the system is achieved by means of redundant nodes and a fault-tolerant scheduling algorithm. Since redundant nodes increase total failure rate of the system, the number of such redundant nodes should be small. This paper proposes a procedure for fault-tolerant scheduling, which realizes fault-tolerance via a small number of redundant nodes. The procedure is based on such a technique that allows multiple copies of a task to be executed concurrently. It achieves efficient utilization of nodes by forcing copies being executed to terminate immediately after having obtained the first result of these copies. A fundamental scheduling algorithm, into which the procedure is incorporated, is defined and its simulation results are shown.< >
N. SatyanarayanaRaghvendra MallAnil Kumar Pal
Hanming ChenWei Min WangWei LuoJun Xiang
Ping ZhuYang Fu-minTu GangWei Luo