Abstract: As the scale of distributed training increases, it brings huge communication overhead in clusters. Some works try to reduce the communication cost through gradient compression or ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results