diff --git a/docs/mindspore/programming_guide/source_en/enable_dataset_autotune.md b/docs/mindspore/programming_guide/source_en/enable_dataset_autotune.md index de21ec397006ff16a3b27b445f32a23ce65e64eb..85258b420187726c43b38400f84aa5b4bf1277b0 100644 --- a/docs/mindspore/programming_guide/source_en/enable_dataset_autotune.md +++ b/docs/mindspore/programming_guide/source_en/enable_dataset_autotune.md @@ -4,11 +4,15 @@ -- [Enabling AutoTune for Dataset Pipeline](#enabling-dataset-autotune) - - [Overview](#autotune-overview) - - [Enable AutoTune](#autotune-dataset-enable) - - [Time Interval for AutoTune](#autotune-dataset-interval) - - [Constraints](#autotune-dataset-constraints) +- [Enabling AutoTune for Dataset Pipeline](#enabling-autotune-for-dataset) + - [Overview](#overview) + - [Enable AutoTune](#enable-autotune) + - [Time Interval for AutoTune](#time-interval-for-autoTune) + - [Constraints](#constraints) + - [Example](#example) + - [AutoTune Config](#autotune-config) + - [Start training](#start-training) + - [Before next training](#before-next-training) @@ -16,9 +20,19 @@ ## Overview -MindSpore provides AutoTune support to automatically tune Dataset pipelines to improve performance. This feature can automatically detect a bottleneck operator in the dataset pipeline and respond by automatically adjusting tunable parameters for dataset ops, like increasing the number of parallel workers or updating the prefetch size of dataset ops. With dataset AutoTune enabled, MindSpore will sample dataset statistics at a given interval, which is tuneable by the user. +MindSpore provides AutoTune support to automatically tune Dataset pipelines to improve performance. -AutoTune for Dataset is disabled by default. +This feature can automatically detect a bottleneck operator in the dataset pipeline and respond by automatically adjusting tunable parameters for dataset ops, like increasing the number of parallel workers or updating the prefetch size of dataset ops. + +![autotune](images/autotune.png) + +With dataset AutoTune enabled, MindSpore will sample dataset statistics at a given interval, which is tuneable by the user. + +Once AutoTune collects enough information, it will analyze whether the performance bottleneck is on the dataset side or not. +If so, it will adjust the parallelism and speedup the dataset pipeline. +If not, AutoTune will also try to reduce the memory usage of the dataset pipeline to release memory for CPU. + +> AutoTune for Dataset is disabled by default. ## Enable AutoTune @@ -42,11 +56,72 @@ To query the time interval (in milliseconds) for dataset pipeline autotuning: ```python import mindspore.dataset as ds -ds.config.get_autotune_interval() +print("time interval:", ds.config.get_autotune_interval()) ``` ## Constraints -AutoTune for Dataset is currently available for sink pipelines only. +AutoTune for Dataset is currently available for sink mode only (dataset_sink_mode=True). Both dataset profiling and dataset Autotune may not be enabled concurrently. + +**TODO maybe? or definitely can not use together?** + +## Example + +Take ResNet training as example. + +### AutoTune config + +To enable Dataset AutoTune, only one statement is needed. + +```python +# dataset.py of ResNet in ModelZoo +# models/official/cv/resnet/src/dataset.py +def create_dataset1(...): + '''create a train or evaluate cifar10 dataset for resnet50''' + # enable Dataset AutoTune here + ds.config.set_enable_autotune(True) + # Other code keep unchanged + data_set = ds.Cifar10Dataset(...) + ... +``` + +### Start training + +Start the training process as described in [resnet/README.md](https://gitee.com/mindspore/models/blob/master/official/cv/resnet/README.md), then AutoTune will display its analysis result through LOG messages. + +```text +TODO, could you please paste some log examples here? thx +we can pick some logs from our experiments + +including +start autotune +bad step time +adjust queue and parallel workers, +good step time +MD is good after adjust +The final config we recommand at the end of autotune +``` + +TODO Some analysis to explain the meaning of log: + +The LOG messages show the overall progress of AutoTune. The initial configuration of dataset is bad, so the usage of queue is low. Then AutoTune increases the queue size and parallel workers of dataset OPs to speedup the dataset pipeline. After tunning the configuration of dataset pipeline, the step time reduced a lot. + +At the end of AutoTune, an optimal configuration is created by AutoTune. +Users can also catch them from the LOG messages. + +```text +log message in the end +to show what are the configs currently... +``` + +### Before next training + +Before starting the next training process, users can apply the recommanded configurations on dataset Python scripts. +This allow dataset pipeline can be run at a high speed at the beginning of training preocess. + +By the way, MindSpore also provides APIs to set the global value of num_parallel_workers and prefetch_size. +Please refer to [mindspore.dataset.config](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.dataset.config.html): +- [mindspore.dataset.config.set_num_parallel_workers](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.dataset.config.html#mindspore.dataset.config.set_num_parallel_workers) +- [mindspore.dataset.config.set_prefetch_size](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.dataset.config.html#mindspore.dataset.config.set_prefetch_size) \ No newline at end of file diff --git a/docs/mindspore/programming_guide/source_en/images/autotune.png b/docs/mindspore/programming_guide/source_en/images/autotune.png new file mode 100644 index 0000000000000000000000000000000000000000..cbe795a79ef0629588044e7c8e7e42ab9a691517 Binary files /dev/null and b/docs/mindspore/programming_guide/source_en/images/autotune.png differ diff --git a/docs/mindspore/programming_guide/source_en/index.rst b/docs/mindspore/programming_guide/source_en/index.rst index b8e9c3996b9f3024267b824f15909e83a907e1b0..ef62050096bd3905ad85b2a140a393e5c7180132 100644 --- a/docs/mindspore/programming_guide/source_en/index.rst +++ b/docs/mindspore/programming_guide/source_en/index.rst @@ -157,6 +157,7 @@ MindSpore Programming Guide enable_mixed_precision enable_graph_kernel_fusion enable_auto_tune + enable_dataset_autotune apply_gradient_accumulation Debugging performance with Profiler↗ diff --git a/docs/mindspore/programming_guide/source_zh_cn/enable_dataset_autotune.md b/docs/mindspore/programming_guide/source_zh_cn/enable_dataset_autotune.md new file mode 100644 index 0000000000000000000000000000000000000000..7effad0c1281c5cebfbf32131c2b095872c60298 --- /dev/null +++ b/docs/mindspore/programming_guide/source_zh_cn/enable_dataset_autotune.md @@ -0,0 +1,123 @@ +# 使能自动数据加速 + +`Ascend` `GPU` `CPU` `数据处理` `性能调优` + + + +- [使能自动数据加速](#使能自动数据加速) + - [概述](#概述) + - [如何使能自动数据加速](#如何使能自动数据加速) + - [如何调整自动数据加速的采样间隔](#如何调整自动数据加速的采样间隔) + - [约束](#约束) + - [样例](#样例) + - [自动数据加速配置](#自动数据加速配置) + - [开始训练](#开始训练) + - [在进行下一次训练之前](#在进行下一次训练之前) + + + + + +## Overview + +MindSpore提供了一种自动数据调优的工具——AutoTune,用于在训练过程中根据环境资源的情况自动调整数据处理管道的并行度, +最大化利用系统资源加速数据处理管道的处理速度。 + +在整个训练的过程中,AutoTune模块会持续检测当前训练性能瓶颈处于数据侧还是网络侧。如果检测到瓶颈在数据侧,则将 +进一步对数据处理管道中的各个算子(如GeneratorDataset、map、batch此类数据算子)进行参数调整, +目前可调整的参数包括算子的工作线程数(num_parallel_workers),算子的内部队列深度(prefetch_size)。 + +![autotune](../source_en/images/autotune.png) + +使能AutoTune后,MindSpore会根据一定的时间间隔,对数据处理管道的资源情况进行采样统计。 + +当AutoTune收集到足够的信息时,它会基于这些信息分析当前的性能瓶颈是否在数据侧。 +如果是,AutoTune将调整数据处理管道的并行度,并加速数据集管道的运算。 +如果不是,AutoTune也会尝试减少数据管道的内存使用量,为CPU释放一些可用内存。 + +> 自动数据加速在默认情况下是关闭的。 + +## 如何使能自动数据加速 + +使能自动数据加速: + +```python +import mindspore.dataset as ds +ds.config.set_enable_autotune(True) +``` + +## 如何调整自动数据加速的采样间隔 + +调整自动数据加速的采样时间间隔(单位是毫秒): + +```python +import mindspore.dataset as ds +ds.config.set_autotune_interval(100) +``` + +获取当前设定的采样时间间隔(单位是毫秒): + +```python +import mindspore.dataset as ds +print("time interval:", ds.config.get_autotune_interval()) +``` + +## 约束 + +自动数据加速目前仅可用于下沉模式(dataset_sink_mode=True)。 + +Profiling性能分析和自动数据加速无法同时开启。 + +## 样例 + +以ResNet训练作为一个样例。 + +### 自动数据加速配置 + +若要启用自动数据加速,仅需添加一条语句即可。 + +```python +# dataset.py of ResNet in ModelZoo +# models/official/cv/resnet/src/dataset.py +def create_dataset1(...): + '''create a train or evaluate cifar10 dataset for resnet50''' + # 使能自动数据加速 + ds.config.set_enable_autotune(True) + # 其他数据集代码无需变更 + data_set = ds.Cifar10Dataset(...) + ... +``` + +### 开始训练 + +根据[resnet/README.md](https://gitee.com/mindspore/models/blob/master/official/cv/resnet/README_CN.md)所描述的步骤启动训练, +随后自动数据加速模块会通过LOG的形式展示其对于性能瓶颈的分析情况: + +```text +TODO, could you please paste some log examples here? thx +we can pick some logs from our experiments + +including +start autotune +bad step time +adjust queue and parallel workers, +good step time +MD is good after adjust +The final config we recommand at the end of autotune +``` + +性能的分析结果可以在上述的LOG中体现: + +LOG信息反映了自动数据加速模块整体的流程。初始的数据管道配置是较差的,这导致了数据处理管道的性能较低,具体体现在队列的利用率较低。 +随后基于此情况,自动数据加速模块增大了工作线程数与算子内部队列深度,提高了数据处理管道的并行性。在应用了较优的数据处理管道配置后, +整体的step time可以得到可观的减少。 + +### 在进行下一次训练之前 + +在进行下一次训练之前,用户可以根据自动数据加速模块得出的推荐配置,对数据集脚本进行调整, +以便在下一次训练的开始时就达到以较优性能水平运行数据处理管道。 + +另外,MindSpore也提供了相关的API用于全局调整数据处理管道算子的并行度与内部队列深度, +请参考[mindspore.dataset.config](https://www.mindspore.cn/docs/api/zh-CN/master/api_python/mindspore.dataset.config.html): +- [mindspore.dataset.config.set_num_parallel_workers](https://www.mindspore.cn/docs/api/zh-CN/master/api_python/mindspore.dataset.config.html#mindspore.dataset.config.set_num_parallel_workers) +- [mindspore.dataset.config.set_prefetch_size](https://www.mindspore.cn/docs/api/zh-CN/master/api_python/mindspore.dataset.config.html#mindspore.dataset.config.set_prefetch_size) diff --git a/docs/mindspore/programming_guide/source_zh_cn/index.rst b/docs/mindspore/programming_guide/source_zh_cn/index.rst index 8047a08551765682fa1e76e5e6ada996e4694e8b..3cd9ade993093496c3cfccb623b73244d730ad77 100644 --- a/docs/mindspore/programming_guide/source_zh_cn/index.rst +++ b/docs/mindspore/programming_guide/source_zh_cn/index.rst @@ -174,6 +174,7 @@ MindSpore编程指南 enable_mixed_precision enable_graph_kernel_fusion enable_auto_tune + enable_dataset_autotune apply_gradient_accumulation 使用Profiler调试性能↗