Distributed matrix multiplication with MDS array BP-XOR codes for scaling clusters
Citation
Arslan, S. S., (JUL 07-12, 2019). IEEE International Symposium on Information Theory (ISIT). (July 01, 2019). Distributed Matrix Multiplication with MDS Array BP-XOR Codes for Scaling Clusters. 1792-1796. Paris, FRANCE.Abstract
This study presents a novel coded computation technique for distributed matrix-matrix product computation at a massive scale that outperforms well known previous strategies in terms of total execution time. Our method achieves this performance by distributing the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. The product computation is performed using MDS array Belief Propagation (BP)-decodable codes based on pure XOR operations. In addition, our scheme is configurable and suited for modern compute node architectures equipped with multiple processing units organized in a hierarchical manner. Assuming the number of backup nodes being sublinear in the size of the product, we shall demonstrate that the proposed scheme achieves order-optimal computation from an end-to-end latency perspective while ensuring acceptable communication requirements that can be addressed by today's high speed network link infrastructures.