| |
Reference
Books
-
Fault Tolerant
Computer System design by D. K. Pradhan, Prentice Hall.
-
Reliable Computer
Systems: Design and Evaluation (second edition) by D. P. Siewiorek and R. S.
Swarz, Digital Press.
-
Design and Analysis
of Fault Tolerant Digital Systems by B.W. Johnson, Addison Wesley, 1989.
-
Fault Tolerance in
Distributed Systems, Pankaj Jalote, PTR Printice Hall, 1994.
Fundamental
Concepts
|
A
Coneptual Framework for System Fault Tolerance |
|
Definition
of terms used in Fault Tolerance |
|
Reliability
Concepts - Part 1 |
|
Reliability
Concepts - Part 2 |
|
Notes
on Byzantine Generals Problem |
|
A. Avizienis and J.
Laprie, ``Dependable Computing: From Concepts to Design Diversity,'' Proc.
IEEE, vol.74, no.5, pp.629-638, May 1986. |
|
A.K. Somani and N.H.
Vaidya, ``Understanding fault-tolerance and reliability,'' IEEE Computer,
vol.30, no.4, pp.45-50, Apr. 1997. |
|
M. Pease, R.Shostak,
and L. Lamport, ``Reaching Agreement in the Presence of Faults,'' M. Pease,
R.Shostak, and L. Lamport, Journal of ACM, #27 (180), pp.228-234. |
|
The Byzantine
Generals Problem, ACM Trans. Prog. Languages and Systems, 4(1982) pp.
382-401. |
Software
Fault Tolerance
|
N. Leveson, J.
Knight, and T. Shimeall, ``The use of self check and voting in software
error detection: An empirical study,'' IEEE transactions on Software
Engineering, April 1990. |
|
A. Avizienis and j.
Kelly, ``Fault Tolerance by Design Diversity: Concepts and Experiments,''
IEEE Computer, August 1984, pp. 67-80. |
|
J.H. Purtilo and P.
Jalote, ``An environment for developing fault-tolerant software,'' IEEE
Trans. Software Engg., vol.17, no.2, pp.153-159, Feb. 1991. |
Fault
Detection and Location in Multiprocessor Systems
|
A
review of system-level diagnosis |
|
S. Tridandapani, A.
K. Somani, and U. Reddy, ``Low Overhead Multiprocessor Allocation Strategies
Exploiting System Spare Capacity for Fault Detection and Location,'' in IEEE
Transactions on Computers, Vol. 44, No. 7, July 1995, pp. 865-877. |
|
K. Mahesh, G.
Manimaran, C. S. R. Murthy, and A. K. Somani, ``Scheduling Algorithms
Exploiting Spare Capacity and Tasks' Laxities for Fault Detection and
Location in Real-time Multiprocessor Systems,'' Journal of Parallel and
Distributed Computing, vol.51, no.2, pp.136-150, June 1998. |
|
Using
spare capacity in SMT processors |
Fault
Tolerance in Real-time Systems
|
Fundamentals
of Real-time Systems -For a report, you have to send mail to Dr. Manimaran |
|
J.W.S. Liu, K.J.
Lin, W.K. Shih, A.C. Yu, J.Y.Chung, and W. Zhao, ``Algorithms for scheduling
imprecise computations,'' IEEE Computer, vol.24, no.5, pp.58-68, May 1991. |
|
P. Ramanathan,
``Graceful degradation in real-time control applications using (m,k)-firm
guarantee,'' In Proc. Fault-Tolerant Computing Symp., pp.132-141, 1997. |
|
S. Ghosh, R. Melhem,
and D. Mosse, ``Fault-tolerance through scheduling of aperiodic tasks in
hard real-time multiprocessor systems,'' IEEE Trans. Parallel and
Distributed Systems, vol.8, no.3, pp.272-284, Mar. 1997. |
|
G. Manimaran and C.
Siva Ram Murthy, ``A fault-tolerant dynamic scheduling algorithm for
real-time multiprocessor systems and its analysis,'' IEEE Trans. Parallel
and Distributed Systems, vol.9, no.11, pp.1137-1152, Nov. 1998. |
|
J.H. Lala and R.E.
Harper, ``Architectural principles for safety-critical real-time
applications,'' Proc. of IEEE, vol.82, no.1, pp.25-40, Jan. 1994. |
|
A.L. Liestman and
R.H. Campbell, ``A fault-tolerant scheduling problem,'' IEEE Trans. Software
Engg., vol.12, no. 11, pp. 1089-1095, Nov. 1986. |
Dependable
Communication
Checkpointing
Other
Relevant Papers
|
A General
Constructive Approach to Fault Tolerant Design Using Redundancy, by Barbour
and Wojcik, IEEE Transactions on Computers, Jan 1989, pp. 15. |
|
S. B. Choi and A. K.
Somani, ``Design and Performance Analysis of Load-distributing
Fault-tolerant Network,'' in IEEE Transactions on Computers, Vol. 45, No. 5,
May 1996, pp. 540-551. |
|
P. M. Chen, E. K.
Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson, `` RAID:
High-Performance, Reliable Secondary Storage,'' ACM Computing surveys, Vol.
26, No. 2, Jun 1994, pp. 145-164. |
|
A
paper on DIRSMIN |
|
Figure
1 for the DIRSMIN paper |
|
A
paper on embedding binary tree in faulty hypercube |
Relevant
Journals
Relevant
Conference Proceedings
|
Proc. of Fault
Tolerant Computing Symposium. |
|
Proc. of Intl. Conf.
Parallel and Distributed Systems. |
|
Proc. of Intl.
Parallel Processing Symposium. |
|
Other related
journals and conference proceedings. |
| |
|