Scalable RDMA performance in PGAS languages
Montse Farrerasy, George Almásiz, et al.
IPDPS 2009
Given a set of n different deterministic finite state machines (DFSMs) modeling a distributed system, we examine the problem of tolerating f crash or Byzantine faults in such a system. The traditional approach to this problem involves replication and requires n ̇ f backup DFSMs for crash faults and 2 ̇ n ̇ f backup DFSMs for Byzantine faults. For example, to tolerate two crash faults in three DFSMs, a replication based technique needs two copies of each of the given DFSMs, resulting in a system with six backup DFSMs. In this paper, we question the optimality of such an approach and present an approach called ( f ,m)-fusion that permits fewer backups than the replication based approaches. Given n different DFSMs, we examine the problem of tolerating f faults using just m additional DFSMs. We introduce the theory of fusion machines and provide an algorithm to generate backup DFSMs for both crash and Byzantine faults. We have implemented our algorithms in Java and have used them to automaticaly generate backup DFSMs for several examples. © 2009 IEEE.
Montse Farrerasy, George Almásiz, et al.
IPDPS 2009
Rahul Garg, Vijay K. Garg, et al.
IEEE TPDS
Madhukar Korupolu, Aameek Singh, et al.
IPDPS 2009
G. Cong, S.R. Seelam, et al.
IPDPS 2009