diff --git a/README.md b/README.md index d1d36a1c72a08128f4c841ce7be1243b9587681c..87a90718860afac148bd566c6fc2d8701572ff55 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,193 @@ -plugsched is a linux kernel hotpluggable scheduler SDK developed by Alibaba -and licensed under the GPLv3+ License or BSD-3-Clause License. This product -contains various third-party components under other open source licenses. -See the NOTICE file for more information. +## Plugsched: live update Linux kernel scheduler +Plugsched is a SDK that enables live updating the Linux kernel scheduler. It can dynamically replace the scheduler subsystem without rebooting the system or applications, with milliseconds downtime. Plugsched can help developers to dynamically add, delete and modify kernel scheduling features in the production environment, which allows customizing the scheduler for different specific scenarios. The live-update capability also enables rollback. + +## Motivation +* **Different policies fit for differnt scenarios:** In the scenario of cloud-computing, optimizing scheduling policies is complex, and an one-fit-all strategy does not exist. So, it is necessary to allow users to customize the scheduler for different scenarios. +* **Scheduler evolved slowly :** Linux kernel has been evolved and iterated for many years, and has a heavy code base. Scheduler is one of the core subsystems of the kernel and its structure is complex and tightly coupled with other OS subsystems, which makes the development and debugging even harder. Linux rarely merges new scheduling classes, and would be especially unlikely to accept a scenario-specific or non-generic scheduler. Plugsched can decouple the scheduler from the kernel, and developers can only focus on the iterative development of the scheduler. +* **Updating kernel is hard:** The scheduler is built into the kernel, so applying changes to the scheduler requires updating the kernel. The kernel release cycle is usually several months, which makes the changes not able to be deployed quickly. Furthermore, updating kernel is even more expensive in the cluster, which involves application migration and machine downtime. +* **Unable to update a subsystem:** kpatch and livepatch are live update techniques of function granularity, which have weak expressive ability and cannot implement complex code changes. For eBPF, it doesn't support the scheduler well yet, and even if it were, it would only allow small modifications to the scheduling policies. + +## How it works +The scheduler subsystem is built into the kernel, not an independent module . And it's highly coupled to other parts of the kernel. Plugsched takes advantage of the idea of modularization: it provides a boundary analyzer that determines the boundary of the scheduler subsystem and extracts the scheduler from the kernel code into a separate directory. Developers can modify the extracted scheduler code and compile it into a new scheduler module and dynamically replace the old scheduler in the running system. + +For functions, the scheduler module exports some **interface** functions. By replacing these functions in the kernel, the kernel can bypass the original execution logic and enter the new scheduler module, thereby completing the function update. Functions compiled in the scheduler module are either interface functions, or **insiders**. Other functions are all called **outsiders**. + +For data, the scheduler module re-initializes **private data** and inherits **shared data** from the previous scheduler. Re-initializing is more than resetting the memory, but through Scheduler State Rebuild technique. Most of the important data (runqueue state and sched class state) is handled by Scheduler State Rebuild technique, making them private automatically. And plugsched allows users to manually define some of the rest of the data as private data for flexibility. However, the manually defined private data merely means resetting the memory. So by default, the rest of the data is shared data for simplicity. + +Also for data, users want to know not only how is the data initialized, but also whether they can modify some data itself, or the semantic of it. No strict rules are set on global variables and stack variables yet, so users can modify either the data themselves or the semantics of them. But data structures are different. First, plugsched classifies struct fields which is accessed only by the scheduler as **inner-fields**, others as **non-inner-fields**. The scheduler module allows modifying the semantics of inner fields, and forbids to modify the semantics of non-inner fields. And the scheduler module even allows modifing the size of the whole data structure if all fileds are inner fileds. Last but most important, we recommend using reserved fields of data structures, rather than modifying existing ones. + +For example, modifying the state of `rq->lock` only changes its data, while using `rq->lock` to store something else changes its semantics, and reducing the size of `struct rq` is equivalent to modifying many members of `struct rq`. But since `rq->lock` is accessed by many subsystems, making it non-inner data. Users are forbidden to modify `rq->lock`, or shrink the size of `struct rq`. + +### Boundary Extraction +The scheduler itself is not a module, so it is necessary to determine the boundary of the scheduler for modularization. The boundary analyzer extracts the scheduler code from the kernel source code according to the boundary configuration information. The configuration mainly includes source code files, the interface functions, etc. Finally, the code within the boundary is extracted into a separate directory. The process is mainly divided into the following steps. + +* Gather Information + Compile the Linux kernel and use gcc-python-plugin to collect information related to boundary analysis, such as symbol names, location information, symbol attributes, and function call graph, etc. + +* Boundary Analysis + Analyze the gathered information, calculate the code and data boundaries of the scheduler according to the boundary configuration, and determine which functions and data are within the scheduler boundary. + +* Code Extraction + Use gcc-python-plugin again to extract the code within the boundary into the kernel/sched/mod directory as the code base for the new scheduler module. + +### Develop the scheduler +After the extraction, the scheduler's code is put in a separate directory. Developers can modify the code and customize the scheduler according to different scenarios. Please see [Limitations](#limitations) for precautions during development. + +### Compile and install the scheduler +After the development, the scheduler with loading/unloading and other related code will be compiled into a kernel module, then be packaged in RPM. After installation, the original scheduler built in the kernel will be replaced. The installation will go through the following key steps. +* **Symbol Relocation:** relocate the undefined symbols in scheduler module. +* **Stack Safety Check:** Like kpatch, stack inspection must be performed before function redirection, otherwise the system may crash. Plugsched optimizes stack inspection in parallel, which improves efficiency and reduces downtime. +* **Redirections:** Dynamically replace interface functions in kernel with corresponding functions in module. +* **Scheduler State Rebuild:** Synchronize the state between the new and old scheduler automatically, which greatly simplifies the maintenance of data state consistency. + +![Architecture](https://user-images.githubusercontent.com/33253760/156824976-c15684be-467b-45ac-abd6-976a9a5d542f.jpg) + +## Use Cases +1. Quickly develop, verify and release new features, and merge them into the kernel mainline after being stable. +2. Customize and optimize for specific business scenarios, publish and maintain non-generic scheduler features using RPM packages. +3. Unified management of scheduler hotfixes to avoid conflicts caused by multiple hotfixes. + +## Quick Start +Plugsched currently supports Anolis OS 7.9 ANCK by default, and other OS need to adjust the [boundary configrations](./docs/Support-various-Linux-distros.md). In order to reduce the complexity of building a running environment, we provide container images and Dockerfiles, and developers do not need to build a development environment by themselves. For convenience, we purchased an Alibaba Cloud ECS (64CPU + 128GB) and installed the Anolis OS 7.9 ANCK. We will live update the kernel scheduler. + +1. Log into the cloud server, and install some neccessary basic software packages. +```shell +# yum install anolis-repos -y +# yum install podman kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r) --enablerepo=Plus-debuginfo --enablerepo=Plus -y +``` + +2. Create a temporary working directory and download the source code of the kernel. +```shell +# mkdir /tmp/work +# uname -r +4.19.91-25.2.an7.x86_64 +# cd /tmp/work +# wget https://mirrors.openanolis.cn/anolis/7.9/Plus/source/Packages/kernel-4.19.91-25.2.an7.src.rpm +``` + +3. Startup the container, and spawn a shell. +```shell +# podman run -itd --name=plugsched -v /tmp/work:/tmp/work -v /usr/src/kernels:/usr/src/kernels -v /usr/lib/debug/lib/modules:/usr/lib/debug/lib/modules docker.io/plugsched/plugsched-sdk +# podman exec -it plugsched bash +# cd /tmp/work +``` + +4. Extract kernel source code. +```shell +# plugsched-cli extract_src kernel-4.19.91-25.2.an7.src.rpm ./kernel +``` + +5. Boundary analysis and extraction. +```shell +# plugsched-cli init 4.19.91-25.2.an7.x86_64 ./kernel ./scheduler +``` + +6. The extracted scheduler code is in ./scheduler/kernel/sched/mod now, simply modify the __schedule function, and then compile and package it into a scheduler rpm package. +```diff +diff --git a/kernel/sched/mod/core.c b/kernel/sched/mod/core.c +index f337607..88fe861 100644 +--- a/kernel/sched/mod/core.c ++++ b/kernel/sched/mod/core.c +@@ -3235,6 +3235,8 @@ static void __sched notrace __schedule(bool preempt) + struct rq *rq; + int cpu; + ++ printk_once("scheduler: Hi, I am the new scheduler!\n"); ++ + cpu = smp_processor_id(); + rq = cpu_rq(cpu); + prev = rq->curr; +``` +```shell +# plugsched-cli build /tmp/work/scheduler +``` + +7. Copy the scheduler rpm to the host, exit the container, and then install scheduler. +```text +# cp /usr/local/lib/plugsched/rpmbuild/RPMS/x86_64/scheduler-xxx-4.19.91-25.2.an7.yyy.x86_64.rpm /tmp/work +# exit +exit +# rpm -ivh /tmp/work/scheduler-xxx-4.19.91-25.2.an7.yyy.x86_64.rpm +# dmesg | tail -n 10 +[ 878.915006] scheduler: total initialization time is 5780743 ns +[ 878.915006] scheduler module is loading +[ 878.915232] scheduler: Hi, I am the new scheduler! +[ 878.915232] scheduler: Hi, I am the new scheduler! +[ 878.915990] scheduler load: current cpu number is 64 +[ 878.915990] scheduler load: current thread number is 626 +[ 878.915991] scheduler load: stop machine time is 243138 ns +[ 878.915991] scheduler load: stop handler time is 148542 ns +[ 878.915992] scheduler load: stack check time is 86532 ns +[ 878.915992] scheduler load: all the time is 982076 ns +``` + +## FAQ +**Q: Under the default boundary configuration, what does the scheduler contain after boundary extraction?** + +Contains the following: + +- [ ] autogroup +- [ ] cpuacct +- [ ] cputime +- [X] sched debug +- [X] sched stats +- [X] cfs rt deadline idle stop sched class +- [X] sched domain topology +- [X] sched tick +- [X] scheduler core + +**Q: Which functions can I modify?** + +After boundary extraction, all functions defined in the files in the kernel/sched/mod directory can be modified. For example, in the example of Quick Start, 1K+ functions of the scheduler can be modified. However, there are some precautions, please refer to [Limitations](#limitations). + +**Q: Can I modify the scheduler boundary?** + +Yes. The scheduler boundary can be modified by editing boundary configuration, such as modifying the source code file, interface function, etc. Please refer to [here](./docs/Support-various-Linux-distros.md). Note that if the scheduler boundary is adjusted, strictly testing is required before installing the scheduler into production environment. + +**Q: What kernel versions does plugsched support?** + +Theoretically, plugsched is decoupled from the kernel version. The kernel versions we have tested are 3.10 and 4.19. Other versions need to be adapted and tested by developers. + +**Q: Can I modify functions defined in header files?** + +Yes. Boundary analyzer also works for header files. Functions in kernel/sched/mod/\*.h can be modified, except those follows with a comment "DON'T MODIFY FUNCTION ****** ,IT'S NOT PART OF SCHEDMOD". + +**Q: Can structures be modified?** + +It depends. If there are any non-inner-fields in the structure, the structure can't be modified. On the contrary, if there aren't any non-inner-fields in the structure, it can be modified. + +When modifying a structure, it's most recommended to use the reserved fields in the structure, and secondly recommended to reuse the inner-fields in the structure. + +**Q: Will there be a performance regression when the kernel scheduler is replaced?** + +The overhead incurred by plugsched can be ignored, and the performance regression is mainly depend on the code modificated by developers. After the benchmark test, the new scheduler has no performance impact if no modification was applied. + +**Q: Is there any downtime when loading scheduler modules? how many?** + +It depends on the current system load and the number of threads. In our tests, we have 10k+ processes running on a 104 logical CPU machine. And the downtime is less than 10ms. + +**Q: What's the difference between plugsched and kpatch? Do we achieve the same goal by optimizing kpatch? ** + +kpatch is live updating for function granularity, while plugsched for subsystem-wide. Some capabilities cannot be achieved through kpatch optimization. For example, kpatch can not modify the __schedule function, and can not modify thousands of functions at the same time. + +**Q: Does plugsched conflict with the hotfix of Kpatch?** + +Yes. The overlaped part between plugsched and kpatch will be overwrote by plugsched. However, we have designed conflict detecting mechanisms that can be used in the production environment. + +**Q: Can I modify a function outside the scheduler boundary?** + +Yes. We provide the [sidecar](./docs/Advanced-Features.md) mechanism to modify functions outside the boundary. For example, if we want to modify both the scheduler and cpuacct , we can use the sidecar to modify cpuacct. + +## Supported Architectures +- [X] x86-64 +- [ ] aarch64: plan to do + +## Limitations +* Cannot modify the init functions because they have been released after rebooting. If you need to, please do it in module initialization. +* The interface function signature cannot be modified. And the interface function can not be deleted, but you can modify it to make it an empty function. +* Can not modify the functions with "DON'T MODIFY FUNCTION ******, IT'S NOT PART OF SCHEDMOD" comment; +* We don't recommend modifying structures and semantics of their members at well. If you really need to, please refer to the working/sched_boundary_doc.yaml documentation. +* After the scheduler module is loaded, you cannot directly hook a kernel function within the scheduler boundary, such as perf or ftrace tools. If you need to, please specify the scheduler.ko module in the command. + +## License +plugsched is a linux kernel hotpluggable scheduler SDK developed by Alibaba and licensed under the GPLv3+ License or BSD-3-Clause License. This product contains various third-party components under other open source licenses. See the NOTICE file for more information. diff --git a/README_zh.md b/README_zh.md index 9c476009b9dabc22b100880d1e950c4c7b765ee8..f7da72acd8a65362adb02d20c651617d5b6a22bb 100644 --- a/README_zh.md +++ b/README_zh.md @@ -4,30 +4,34 @@ plugsched 是 Linux 内核调度器子系统热升级的 SDK,它可以实现 ## Motivation * **应用场景不同,最佳调度策略不同:** 在云场景下,调度策略的优化比较复杂,不存在“一劳永逸”的策略。因此,允许用户定制调度器用于不同的场景是必要的。 * **调度器迭代慢:** Linux 内核经过很多年长时间的更新迭代,它的代码变得越来越繁重,而调度器是内核最核心的子系统之一,它的结构复杂,与其它子系统紧密耦合,这使得开发和调试变得越发困难。Linux 很少增加新的调度类,尤其是不太可能接受非通用或场景针对型的调度器。plugsched 可以让调度器与内核解耦 ,开发人员可以只关注调度器的迭代开发。 -* **内核升级困难:** 调度器内嵌在内核中,因此应用调度器的修改需要在集群中更新内核。内核发布周期通常是数月之久,这将导致新的调度器无法及时应用在系统中。再者,要在集群中升级新内核,涉及迁移业务和停机升级,对业务方来说代价昂贵。 +* **内核升级困难:** 调度器内嵌在内核中,因此应用调度器的修改需要更新内核。内核发布周期通常是数月之久,这将导致新的调度器无法及时应用在系统中。再者,要在集群中升级新内核,涉及迁移业务和停机升级,对业务方来说代价昂贵。 * **无法升级子系统:** kpatch 和 livepatch 是函数粒度的热升级方案,可表达能力较弱,不能实现复杂的代码改动;对于 eBPF,当前调度器还不支持 ebpf hook,将来即使支持,也只是局部策略的修改。 -## How it works +## How it works 调度器子系统在内核中并非是一个独立的模块,而是内嵌在内核中,与内核其它部分紧密相连。plugsched 采用“模块化”的思想:它提供了边界划分程序,确定调度器子系统的边界,把调度器从内核代码中提取到独立的目录中,开发人员可对提取出的调度器代码进行修改,然后编译成新的调度器内核模块,动态替换内核中旧的调度器。 - - 经过边界划分后的调度器是一个封闭的模块,对于函数而言,它对外呈现了一些关键的函数(接口函数),以这些函数为入口就可以进入调度器模块中执行模块内的程序。因此,通过替换内核中的这些函数,内核就可以绕过原有的执行逻辑进入新的调度器模块中执行,即可完成函数的升级。 - - 调度器的数据可以分为两大类,共享数据和私有数据。共享数据是指调度器与内核其它部分以及不同调度器模块之间共享的数据;私有数据是指只在调度器模块内使用的数据。对于结构体成员而言,同样满足该分类规则,分为共享成员和私有成员。简单而言,共享数据不可以修改其属性和语义,私有数据可以修改其属性和语义。对于结构体数据而言,共享成员不可以被修改,私有成员可以修改其语义,倘若结构体的成员全都是私有的,则整个结构体数据都是私有的,可修改结构体定义。对于数据的修改,plugsched 提供了调度器状态重建功能,可以帮助开发人员简化数据的维护和升级工作。 - + +对于函数而言,它对外呈现了一些接口函数。通过替换内核中的这些函数,内核就可以绕过原有的执行逻辑进入新的调度器模块中执行,即可完成函数的升级。在模块中的函数要么是接口函数,要么是内部函数,其它函数都是外部函数。 + +对于数据,调度器模块重新初始化私有数据,并从前一个调度器继承共享数据。大多数重要数据(运行队列状态和调度类状态)通过状态重建技术重新初始化,而不仅仅是重置内存,它们因此自动变成了 private 数据。为了灵活性, plugsched 允许用户手动将其余部分数据定义为私有数据。然而,手动定义的私有数据只会被清零。因此,为了简单起见,默认情况下,其它数据是共享数据。 + +对于数据而言,用户不仅想知道数据是如何初始化的,还想知道是否可以修改数据本身或者它的语义。Plugsched 没有对全局变量和局部变量设置严格的规则,因此用户可以修改数据本身或它们的语义。但是结构体不同。首先,plugsched 将只被调度器访问的结构体成员分类为内部成员,其他为非内部成员。调度器模块允许修改内部成员的语义,禁止修改非内部成员的语义。如果结构体所有成员都是内部成员,则调度器模块允许修改整个结构体。但是,我们建议使用结构体中的保留字段,而不是修改现有成员。 + +比如,修改 rq->lock 的状态会修改数据本身,而用 rq->lock 存其他数据,是修改了它的语义。而缩短 rq 结构体的大小,相当于修改了 rq 中的一些成员。但是由于 rq->lock 被其它子系统使用,因此它是一个非内部成员,所以不允许修改 rq->lock 或 rq 结构体。 + ### 边界提取 -调度器本身并不是模块,因此需要明确调度器的边界才能将它模块化,边界划分程序根据边界配置信息从内核源代码中将调度器模块的代码提取出来。边界配置信息主要包含代码文件范围、对外呈现的接口(称为接口函数)等信息。最终将边界内的代码提取到独立的目录中,主要分为以下过程: +调度器本身并不是模块,因此需要明确调度器的边界才能将它模块化,边界划分程序根据边界配置信息从内核源代码中将调度器模块的代码提取出来。边界配置信息主要包含代码文件、接口函数等信息。最终将边界内的代码提取到独立的目录中,主要分为以下过程: * 信息收集 - + 在 Linux Kernel 编译过程中,使用 gcc-python-plugin 收集边界划分相关的信息,比如符号名、位置信息、符号属性及函数调用关系等; * 边界分析 对收集的信息进行分析,根据边界配置文件,计算调度器模块的代码和数据的边界,明确哪些函数、数据在调度器边界内部; -* 边界提取 +* 代码提取 再次使用 gcc-python-plugin 将边界内的代码提取到 kernel/sched/mod 目录作为调度器模块的 code base。 ### 调度器模块开发 -边界提取之后,调度器模块的代码被放到了独立的目录中,开发人员可修改目录中的调度器代码,根据场景定制调度器,开发过程的注意事项请看 限制小结。 +边界提取之后,调度器模块的代码被放到了独立的目录中,开发人员可修改目录中的调度器代码,根据场景定制调度器,开发过程的注意事项请看 [Limitations](#limitations) 小结。 ### 编译及安装调度器 开发过程结束后,调度器模块代码与加载/卸载及其它相关功能的程序编译成内核模块,并生成调度器rpm包。安装后将会替换掉内核中原有的调度器,安装过程会经历以下几个关键过程: @@ -38,45 +42,44 @@ plugsched 是 Linux 内核调度器子系统热升级的 SDK,它可以实现 ![20220225173717](https://user-images.githubusercontent.com/33253760/155691850-20817e95-afec-4544-a35f-a284896c973c.jpg) -## User Cases +## Use Cases 1. 快速开发、验证、上线新特性,稳定后放入内核主线; -2. 针对垂直业务场景做定制优化,以 RPM 包的形式发布和维护非通用调度器特性; +2. 针对不同业务场景做定制优化,以 RPM 包的形式发布和维护非通用调度器特性; 3. 统一管理调度器热补丁,避免多个热补丁之间的冲突而引发故障; ## Quick Start -Plugsched 可以运行在任何系统中,但为了减轻搭建运行环境的复杂度,我们提供了的容器镜像和 Dockerfile,开发人员不需要自己去搭建开发环境。为了方便演示,这里购买了一台阿里云 ECS(64CPU + 64GB),并安装 Alibaba Cloud Linux2 系统发行版,我们将会对内核调度器进行热升级。 +目前,plugsched 默认支持 Anolis OS 7.9 ANCK 系统发行版,其它系统需要[调整边界配置](./docs/Support-various-Linux-distros.md)。为了减轻搭建运行环境的复杂度,我们提供了的容器镜像和 Dockerfile,开发人员不需要自己去搭建开发环境。为了方便演示,这里购买了一台阿里云 ECS(64CPU + 128GB),并安装 Anolis OS 7.9 ANCK 系统发行版,我们将会对内核调度器进行热升级。 1. 登陆云服务器后,先安装一些必要的基础软件包: ```shell -# yum install docker kernel-debuginfo kernel-devel -y -# systemctl start docker -# systemctl enable docker +# yum install anolis-repos -y +# yum install podman kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r) --enablerepo=Plus-debuginfo --enablerepo=Plus -y ``` 2. 创建临时工作目录,下载系统内核的 SRPM 包: ```shell # mkdir /tmp/work # uname -r -4.19.91-25.1.al7.x86_64 +4.19.91-25.2.an7.x86_64 # cd /tmp/work -# wget https://mirrors.aliyun.com/alinux/2.1903/plus/source/SRPMS/kernel-4.19.91-25.1.al7.src.rpm +# wget https://mirrors.openanolis.cn/anolis/7.9/Plus/source/Packages/kernel-4.19.91-25.2.an7.src.rpm ``` 3. 启动并进入容器: ```shell -# docker run -itd --name=plugsched -v /tmp/work:/tmp/work -v /usr/src/kernels:/usr/src/kernels -v /usr/lib/debug/lib/modules:/usr/lib/debug/lib/modules ghcr.io/aliyun/plugsched/plugsched-sdk -# docker exec -it plugsched bash +# podman run -itd --name=plugsched -v /tmp/work:/tmp/work -v /usr/src/kernels:/usr/src/kernels -v /usr/lib/debug/lib/modules:/usr/lib/debug/lib/modules docker.io/plugsched/plugsched-sdk +# podman exec -it plugsched bash # cd /tmp/work ``` 4. 提取 4.19.91-25.1.al7.x86_64 内核源码: ```shell -# plugsched-cli extract_src kernel-4.19.91-25.1.al7.src.rpm ./kernel +# plugsched-cli extract_src kernel-4.19.91-25.2.an7.src.rpm ./kernel ``` 5. 进行边界划分与提取: ```shell -# plugsched-cli init 4.19.91-25.1.al7.x86_64 ./kernel ./scheduler +# plugsched-cli init 4.19.91-25.2.an7.x86_64 ./kernel ./scheduler ``` 6. 提取后的调度器模块代码在 ./scheduler/kernel/sched/mod 中,简单修改 __schedule 函数,然后编译打包成调度器 rpm 包: @@ -85,19 +88,15 @@ diff --git a/kernel/sched/mod/core.c b/kernel/sched/mod/core.c index f337607..88fe861 100644 --- a/kernel/sched/mod/core.c +++ b/kernel/sched/mod/core.c -@@ -3234,6 +3234,12 @@ static void __sched notrace __schedule(bool preempt) - struct rq_flags rf; - struct rq *rq; - int cpu; -+ static int print_flag = 0; -+ -+ if (!print_flag) { -+ printk("scheduler: Hi, I'm the new scheduler!\n"); -+ print_flag = 1; -+ } +@@ -3235,6 +3235,8 @@ static void __sched notrace __schedule(bool preempt) + struct rq *rq; + int cpu; - cpu = smp_processor_id(); - rq = cpu_rq(cpu); ++ printk_once("scheduler: Hi, I am the new scheduler!\n"); ++ + cpu = smp_processor_id(); + rq = cpu_rq(cpu); + prev = rq->curr; ``` ```shell # plugsched-cli build /tmp/work/scheduler @@ -105,21 +104,21 @@ index f337607..88fe861 100644 7. 将生成的 rpm 包拷贝到宿主机,退出容器,并安装调度器包: ```text -# cp /usr/local/lib/plugsched/rpmbuild/RPMS/x86_64/scheduler-xxx-4.19.91-25.1.al7.yyy.x86_64.rpm /tmp/work +# cp /usr/local/lib/plugsched/rpmbuild/RPMS/x86_64/scheduler-xxx-4.19.91-25.2.an7.yyy.x86_64.rpm /tmp/work # exit exit -# rpm -ivh /tmp/work/scheduler-xxx-4.19.91-25.1.al7.yyy.x86_64.rpm +# rpm -ivh /tmp/work/scheduler-xxx-4.19.91-25.2.an7.yyy.x86_64.rpm # dmesg | tail -n 10 -[ 1177.064016] scheduler: Hi, I'm the new scheduler! -[ 1177.064017] scheduler: Hi, I'm the new scheduler! -[ 1177.064018] scheduler: Hi, I'm the new scheduler! -[ 1177.064018] scheduler: Hi, I'm the new scheduler! -[ 1177.064734] scheduler load: current cpu number is 64 -[ 1177.064735] scheduler load: current thread number is 755 -[ 1177.064735] scheduler load: stop machine time is 274280 ns -[ 1177.064736] scheduler load: stop handler time is 171234 ns -[ 1177.064736] scheduler load: stack check time is 89575 ns -[ 1177.064736] scheduler load: all the time is 991809 ns +[ 878.915006] scheduler: total initialization time is 5780743 ns +[ 878.915006] scheduler module is loading +[ 878.915232] scheduler: Hi, I am the new scheduler! +[ 878.915232] scheduler: Hi, I am the new scheduler! +[ 878.915990] scheduler load: current cpu number is 64 +[ 878.915990] scheduler load: current thread number is 626 +[ 878.915991] scheduler load: stop machine time is 243138 ns +[ 878.915991] scheduler load: stop handler time is 148542 ns +[ 878.915992] scheduler load: stack check time is 86532 ns +[ 878.915992] scheduler load: all the time is 982076 ns ``` ## FAQ @@ -132,18 +131,18 @@ exit - [ ] cputime - [X] sched debug - [X] sched stats -- [X] cfs rt deadline idle stop sched class +- [X] cfs rt deadline idle stop sched class - [X] sched domain topology - [X] sched tick - [X] scheduler core **Q: 调度器热升级可以修改哪些函数?** -边界提取结束后,kernel/sched/mod 目录里的文件中定义的函数都是可以修改的,比如,quick start 示例中,调度器模块可修改的范围包含 1k+ 个函数。但是有些需要注意的地方,请看 限制 章节。 +边界提取结束后,kernel/sched/mod 目录里的文件中定义的函数都是可以修改的,比如,quick start 示例中,调度器模块可修改的范围包含 1k+ 个函数。但是有些需要注意的地方,请看 [Limitations](#limitations) 章节。 **Q:调度器模块的边界可以修改吗?** -可以修改,通过修改边界配置文件可修改调度器边界,比如修改代码文件、接口函数等,请参考这里(链接)。注意,若调整了调度器边界,上线前需要做严格的测试。 +可以修改,通过修改边界配置文件可修改调度器边界,比如修改代码文件、接口函数等,请参考[这里](./docs/Support-various-Linux-distros.md)。注意,若调整了调度器边界,上线前需要做严格的测试。 **Q:plugsched 支持哪些内核版本?** @@ -159,7 +158,7 @@ exit **Q:内核调度器被替换后会有性能回退吗?** -调度器模块本身的 overhead 很小,其次,还取决于开发人员对调度器的修改。经过 benchmark 测试,如果不加任何修改,是没有性能影响的; +调度器模块本身的 overhead 很小,可以被忽略。其次,还取决于开发人员对调度器的修改。经过 benchmark 测试,如果不加任何修改,是没有性能影响的; **Q:加载模块时停机时间长吗?有多少?** @@ -175,15 +174,15 @@ kpatch 是函数粒度的热升级,plugsched 是子系统范围的热升级, **Q:可以修改调度器边界之外的函数吗?** -可以,我们提供了 sidecar 机制可以同时修改边界之外的函数。比如,有些 hotfix 既修改了调度器,又修改了 cpuacct 中的内容,可以使用 sidecar 机制升级 cpuacct 中的内容。 +可以,我们提供了 [sidecar](./docs/Advanced-Features.md) 机制可以同时修改边界之外的函数。比如,有些 hotfix 既修改了调度器,又修改了 cpuacct 中的内容,可以使用 sidecar 机制升级 cpuacct 中的内容。 ## Supported Architectures - [X] x86-64 - [ ] aarch64: plan to do ## Limitations -* 不可修改 init 函数,init 函数已被删除,需要初始化的过程请在加载模块时执行; -* 不可修改接口函数的属性,也不可删除接口函数,如果要删除,可以将函数修改为空函数; +* 不可修改 init 函数,因为 init 函数在系统重启后被释放掉,需要初始化的过程请在加载模块时执行; +* 不可修改接口函数的签名,也不可删除接口函数,如果要删除,可以将函数修改为空函数; * 不可修改任何带有“DON'T MODIFY FUNCTION ******, IT'S NOT PART OF SCHEDMOD”注释的函数; * 不可随意修改结构体及成员的语义,需要修改时请参考 working/sched_boundary_doc.yaml 文档进行; * 加载调度器模块后,不可直接 hook 内核中属于调度器模块范围内的函数,比如 perf 或者 ftrace 等工具,需要时请指定 scheduler.ko 模块;