Horovod comes with several adjustable "knobs" that can affect runtime performance, including
--fusion-threshold-mb
and --cycle-time-ms
(tensor fusion), --cache-capacity
(response cache), and
hierarchical collective algorithms --hierarchical-allreduce
and --hierarchical-allgather
.
Determining the best combination of these values to maximize performance (minimize time to convergence) can be a matter of trial-and-error, as many factors including model complexity, network bandwidth, GPU memory, etc. can all affect inputs per second throughput during training.
Horovod provides a mechanism to automate the process of selecting the best values for these "knobs" called
autotuning. The Horovod autotuning system uses
Bayesian optimization to intelligently search through the
space of parameter combinations during training. This feature can be enabled by setting the --autotune
flag for
horovodrun
:
$ horovodrun -np 4 --autotune python train.py
When autotuning is enabled, Horovod will spend the first steps / epochs of training experimenting with different parameter values and collecting metrics on performance (measured in bytes allreduced / allgathered per unit of time). Once the experiment reaches convergence, or a set number of samples have been collected, the system will record the best combination of parameters discovered and continue to use them for the duration of training.
A log of all parameter combinations explored (and the best values selected) can be recorded by providing
the --autotune-log-file
option to horovodrun
:
$ horovodrun -np 4 --autotune --autotune-log-file /tmp/autotune_log.csv python train.py
By logging the best parameters to a file, you can opt to set the best parameters discovered on the command line instead of re-running autotuning if training is paused and later resumed.
Note that some configurable parameters, like tensor compression, are not included as part of the autotuning process because they can affect model convergence. The purpose of autotuning at this time is entirely to improve scaling efficiency without making any tradeoffs on model performance.
Sometimes you may wish to hold certain values constant and only tune the unspecified parameters. This can be
accomplished by explicitly setting those values on the command line or in the config file provided
by --config-file
:
$ horovodrun -np 4 --autotune --cache-capacity 1024 --no-hierarchical-allgather python train.py
In the above example, parameters cache-capacity
and hierarchical-allgather
will not be adjusted by
autotuning.
Enabling autotuning imposes a tradeoff between degraded performance during the early phases of training in exchange for better performance later on. As such, it's generally recommended to use autotuning in situations where training is both expected to take a long time (many epochs on very large datasets) and where scaling efficiency has been found lacking using the default settings.
You can tune the autotuning system itself to change the number of warmup samples (discarded samples at the beginning), steps per sample, and maximum samples:
$ horovodrun -np 4 --autotune \
--autotune-warmup-samples 5 --autotune-steps-per-sample 20 --autotune-bayes-opt-max-samples 40 \
python train.py
Increasing these values will generally improve the accuracy of the autotuning process at the cost of greater time spent in the autotuning process with degraded performance.
Finally, for those familiar with the underlying theory of Bayesian optimization and Gaussian processes, you can tune the noise regularization term (alpha) to account for variance in your network or other system resources:
$ horovodrun -np 4 --autotune --autotune-gaussian-process-noise 0.75 python train.py
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。