TY - JOUR
T1 - Autoencoder-Based Gradient Compression for Distributed Training
AU - Abrahamyan, Lusine
AU - Deligiannis, Nikos
AU - Bekoulis, Ioannis
AU - Chen, Yiming
PY - 2021
Y1 - 2021
N2 - Large-scale distributed training has recently been proposed as a solution to speed-up the training of deep neural networks on huge datasets. Distributed training, however, entails high communication rates for gradient exchange among computing nodes and requires expensive high-bandwidth network infrastructure. Various gradient compression methods have been proposed to overcome this limitation, including sparsification, quantization, and entropy encoding of the gradients. However, most existing methods leverage only the intra-node information redundancy, that is, they compress gradients at each node independently. In contrast, we advocate that the gradients across the nodes are correlated and propose a method to leverage this inter-node redundancy to obtain higher compression rates. In this work, we propose the Learned Gradient Compression (LGC) framework to reduce communication rates within a distributed training with the parameter server communication protocol. Our framework leverages an autoencoder to capture the common information in the gradients of the distributed nodes and eliminate the transmission of redundant information. Our experiments show that the proposed approach achieves significantly higher gradient compression ratios than state-of-the-art approaches like DGC and ScaleCom.
AB - Large-scale distributed training has recently been proposed as a solution to speed-up the training of deep neural networks on huge datasets. Distributed training, however, entails high communication rates for gradient exchange among computing nodes and requires expensive high-bandwidth network infrastructure. Various gradient compression methods have been proposed to overcome this limitation, including sparsification, quantization, and entropy encoding of the gradients. However, most existing methods leverage only the intra-node information redundancy, that is, they compress gradients at each node independently. In contrast, we advocate that the gradients across the nodes are correlated and propose a method to leverage this inter-node redundancy to obtain higher compression rates. In this work, we propose the Learned Gradient Compression (LGC) framework to reduce communication rates within a distributed training with the parameter server communication protocol. Our framework leverages an autoencoder to capture the common information in the gradients of the distributed nodes and eliminate the transmission of redundant information. Our experiments show that the proposed approach achieves significantly higher gradient compression ratios than state-of-the-art approaches like DGC and ScaleCom.
UR - https://ieeexplore.ieee.org/document/9616078
UR - http://www.scopus.com/inward/record.url?scp=85123208207&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO54536.2021.9616078
DO - 10.23919/EUSIPCO54536.2021.9616078
M3 - Conference paper
VL - 29
SP - 2179
EP - 2183
JO - Proceedings of EUSIPCO
JF - Proceedings of EUSIPCO
SN - 2219-5491
ER -