A reliability model for dependent and distributed MDS disk array units
AuthorArslan, Şuayb Şefik
MetadataShow full item record
CitationArslan, S. S. (November 29, 2018). A Reliability Model for Dependent and Distributed MDS Disk Array Units. Ieee Transactions on Reliability, 1-16.
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte scale storage systems. As drive capacities continue to grow beyond the few terabytes range to address the demands of today’s cloud, the likelihood of having multiple/simultaneous disk failures became a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life-cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multiparity protection schemes based on modern maximum distance separable EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.
SourceTransactions on Reliability
WoS Q CategoryQ1 - Q2
The following license files are associated with this item: