We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Asynchronous Multiprocess Large Model Training on PyTorch

Formale Metadaten

Titel
Asynchronous Multiprocess Large Model Training on PyTorch
Serientitel
Anzahl der Teile
8
Autor
Mitwirkende
Lizenz
CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
"With the increasing popularity of large machine learning models capable of solving complicated tasks in the sphere of natural language processing, computer vision, etc., the need for distributed computation has rocketed significantly. We would like to provide the "surgery" of parallelization methods from one of the most popular deep learning frameworks - PyTorch. Particularly, we would like to demonstrate two main approaches: data parallelization (when the single module is trained asynchronically in streams) and model parallelization (both horizontal – with several models trained simultaneously, and vertical – when the model parameters are split into groups). Moreover, we will guide through the cases of different resources availability, i.e. what could be done when having only CPUs, a single GPU, or multiple GPUs. Our showing is to be done on an example of urban planning problem solution, where we are creating synthetic cities with deep convolutional generative adversarial neural networks. These models have complicated architecture and billions of parameters when generating images starting from mid-resolution like 256x256, which makes them perfect instances for distributed computation demonstration.