Fecha de publicación:
Fuente: WIPO Wine
A data processing system comprising a plurality of processing nodes that are arranged to update a model in a parallel manner. Each of the processing nodes starts with a different set of updates to model parameters. Each of the processing nodes is configured to perform one or more reduce-scatter collectives so as to exchange and reduce the updates. Having done so, each processing node is configured to apply the reduced set of updates to obtain an updated set of model parameters. The processing nodes then exchange the updated model parameters using an all-gather so that each processing node ends up with the same model parameters at the end of the process.