We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Blockchain and BFT: Fast Byzantine SGD (Best Student Paper Award)

Formale Metadaten

Titel
Blockchain and BFT: Fast Byzantine SGD (Best Student Paper Award)
Serientitel
Anzahl der Teile
30
Autor
Lizenz
CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Modern machine learning architectures distinguish servers and workers. Typically, a d-dimensional model is hosted by a server and trained by n workers, using a distributed \textit{stochastic gradient descent} (SGD) optimization scheme. At each SGD step, the goal is to estimate the gradient of a cost function. The simplest way to do this is to \emph{average} the gradients estimated by the workers. However, averaging is not resilient to even one single Byzantine failure of a worker. Many alternative \emph{gradient aggregation rules} (GARs) have recently been proposed to tolerate a maximum number f of Byzantine workers. These GARs differ according to (1) the complexity of their computation time, (2) the maximal number of Byzantine workers despite which convergence can still be ensured (breakdown point), and (3) their accuracy, which can be captured by (3.1) their angular error, namely the angle with the true gradient, as well as (3.2) their ability to aggregate full gradients. In particular, many are not full gradients for they operate on each dimension separately, which results in a coordinate-wise blended gradient, leading to low accuracy in practical situations where the number (s) of workers that are actually Byzantine in an execution is small ($s<We propose \textsc{Aksel}, a new scalable median-based GAR with an optimal time complexity (O(n)), an optimal breakdown point (n>2f) and the lowest upper bound on the \textit{expected angular error} (O(1)) among \textit{full gradient} approaches. We also study the \textit{actual angular error} of \textsc{Aksel} when the gradient distribution is normal and show that it only grows in O(logn), which is the first logarithmic upper bound ever proven assuming an optimal breakdown point. We also report on an empirical evaluation of \textsc{Aksel} on various classification tasks, which we compare to alternative GARs against state-of-the-art attacks. \textsc{Aksel} is the only GAR reaching top accuracy when there is actually none or few Byzantine workers while maintaining a good defense even under the extreme case (s=f). For simplicity of presentation, we consider a scheme with a single server. However, as we explain in the paper, \textsc{Aksel} can also easily be adapted to multi-server architectures.