Completely Fair Scheduler - Misplaced Pages

The Completely Fair Scheduler ( CFS ) was a process scheduler that was merged into the 2.6.23 (October 2007) release of the Linux kernel . It was the default scheduler of the tasks of the SCHED_NORMAL class (i.e., tasks that have no real-time execution constraints) and handled CPU resource allocation for executing processes , aiming to maximize overall CPU utilization while also maximizing interactive performance.

#538461

30-457: In contrast to the previous O(1) scheduler used in older Linux 2.6 kernels, which maintained and switched run queues of active and expired tasks, the CFS scheduler implementation is based on per-CPU run queues, whose nodes are time-ordered schedulable entities that are kept sorted by red–black trees . The CFS does away with the old notion of per-priorities fixed time-slices and instead it aims at giving

60-411: A run queue , or runqueue . The run queue may contain priority values for each process, which will be used by the scheduler to determine which process to run next. To ensure each program has a fair share of resources, each one is run for some time period (quantum) before it is paused and placed back into the run queue. When a program is stopped to let another run, the program with the highest priority in

90-402: A shell pipeline , the output of the first process needs to pass to the second one, and so on. Another example is a task that has been decomposed into cooperating but partially independent processes which can run simultaneously (i.e., using concurrency, or true parallelism – the latter model is a particular case of concurrent execution and is feasible whenever multiple CPU cores are available for

120-740: A virtual memory system, where regions of a process's memory may be really on disk and not in main memory at any time. Even portions of active processes/tasks (executing programs) are eligible for swapping to disk, if the portions have not been used recently. Not all parts of an executing program and its data have to be in physical memory for the associated process to be active. An operating system kernel that allows multitasking needs processes to have certain states . Names for these states are not standardised, but they have similar functionality. When processes need to communicate with each other they must share parts of their address spaces or use other forms of inter-process communication (IPC). For instance in

150-560: A fair share of CPU time to tasks (or, better, schedulable entities). Starting from version 6.6 of the Linux kernel, it was replaced by the EEVDF scheduler. A task (i.e., a synonym for thread) is the minimal entity that Linux can schedule. However, it can also manage groups of threads, whole multi-threaded processes, and even all the processes of a given user. This design leads to the concept of schedulable entities, where tasks are grouped and managed by

180-457: A single process at a time. However, multitasking allows each processor to switch between tasks that are being executed without having to wait for each task to finish ( preemption ). Depending on the operating system implementation, switches could be performed when tasks initiate and wait for completion of input/output operations, when a task voluntarily yields the CPU, on hardware interrupts , and when

210-403: A time: it is impossible to run more programs at the same time. A program might need some resource , such as an input device, which has a large delay, or a program might start some slow operation, such as sending output to a printer. This would lead to processor being "idle" (unused). To keep the processor busy at all times, the execution of such a program is halted and the operating system switches

240-416: Is "something that takes up time", as opposed to "memory", which is "something that takes up space". The above description applies to both processes managed by an operating system, and processes as defined by process calculi . If a process requests something for which it must wait, it will be blocked. When the process is in the blocked state , it is eligible for swapping to disk, but this is transparent in

270-475: Is an implementation of a well-studied, classic scheduling algorithm called weighted fair queuing . Originally invented for packet networks , fair queuing had been previously applied to CPU scheduling under the name stride scheduling . CFS is the first implementation of a fair queuing process scheduler widely used in a general-purpose operating system. The Linux kernel received a patch for CFS in November 2010 for

300-465: Is called concurrency . For security and reliability, most modern operating systems prevent direct communication between independent processes, providing strictly mediated and controlled inter-process communication. In general, a computer system process consists of (or is said to own ) the following resources: The operating system holds most of this information about active processes in data structures called process control blocks . Any subset of

330-403: Is said to own resources, of which an image of its program (in memory) is one such resource. However, in multiprocessing systems many processes may run off of, or share, the same reentrant program at the same location in memory, but each process is said to own its own image of the program. Processes are often called "tasks" in embedded operating systems. The sense of "process" (or task)

SECTION 10

#1732859339539

360-399: Is the execution of those instructions after being loaded from the disk into memory. Several processes may be associated with the same program; for example, opening up several instances of the same program often results in more than one process being executed. Multitasking is a method to allow multiple processes to share processors (CPUs) and other system resources. Each CPU (core) executes

390-408: Is the total number of entities. Choosing the next entity to run is made in constant time because the leftmost node is always cached. Con Kolivas 's work with scheduling, most significantly his implementation of " fair scheduling " named Rotating Staircase Deadline , inspired Ingo Molnár to develop his CFS, as a replacement for the earlier O(1) scheduler , crediting Kolivas in his announcement. CFS

420-426: The setsid() system call.) This solved the problem of slow interactive response times on multi-core and multi-CPU ( SMP ) systems when they were performing other tasks that use many CPU-intensive threads in those tasks. A simple explanation is that, with this patch applied, one is able to still watch a video, read email and perform other typical desktop activities without glitches or choppiness while, say, compiling

450-579: The Linux kernel or encoding video. In 2016, the Linux scheduler was patched for better multicore performance, based on the suggestions outlined in the paper, "The Linux Scheduler: A Decade of Wasted Cores". In 2023, a new scheduler based on earliest eligible virtual deadline first scheduling (EEVDF) was being readied to replace CFS. The motivation was to remove the need for CFS "latency nice" patches. Run queue In modern computers many processes run at once. Active processes are placed in an array called

480-399: The 2.6.38 kernel that has made the scheduler "fairer" for use on desktops and workstations. Developed by Mike Galbraith using ideas suggested by Linus Torvalds , the patch implements a feature called auto-grouping that significantly boosts interactive desktop performance. The algorithm puts parent processes in the same task group as child processes. (Task groups are tied to sessions created via

510-451: The appearance of many processes executing simultaneously (that is, in parallel ), though in fact only one process can be executing at any one time on a single CPU (unless the CPU has multiple cores, then multithreading or other similar technologies can be used). It is usual to associate a single process with a main program, and child processes with any spin-off, parallel processes, which behave like asynchronous subroutines. A process

540-471: The given priority. The scheduler selects the next process from the active array with highest priority. When a process' quantum expires, it is placed into the expired array with some priority. When the active array contains no more processes, the scheduler swaps the active and expired arrays, hence the name O (1) scheduler. In UNIX or Linux , the sar command is used to check the run queue. The vmstat UNIX or Linux command can also be used to determine

570-417: The least slice of execution time (which is saved in the vruntime field of the entity). The nodes are indexed by processor " execution time " in nanoseconds. A " maximum execution time " is also calculated for each process to represent the time the process would have expected to run on an "ideal processor". This is the time the process has been waiting to run, divided by the total number of processes. When

600-601: The number of processes that are queued to run or waiting to run. These appear in the 'r' column. Example: There are two models for Run queues: one that assigns a Run Queue to each physical processor, and the other has only one Run Queue in the system Process (computing) In computing , a process is the instance of a computer program that is being executed by one or many threads . There are many different process models, some of which are light weight, but almost all processes (even entire virtual machines ) are rooted in an operating system (OS) process which comprises

630-696: The operating system scheduler decides that a process has expired its fair share of CPU time (e.g, by the Completely Fair Scheduler of the Linux kernel ). A common form of multitasking is provided by CPU's time-sharing that is a method for interleaving the execution of users' processes and threads, and even of independent kernel tasks – although the latter feature is feasible only in preemptive kernels such as Linux . Preemption has an important side effect for interactive processes that are given higher priority with respect to CPU bound processes, therefore users are immediately assigned computing resources at

SECTION 20

#1732859339539

660-618: The processes that are ready to run). It is even possible for two or more processes to be running on different machines that may run different operating system (OS), therefore some mechanisms for communication and synchronization (called communications protocols for distributed computing) are needed (e.g., the Message Passing Interface {MPI}). By the early 1960s, computer control software had evolved from monitor control software , for example IBSYS , to executive control software . Over time, computers got faster while computer time

690-400: The processor to run another program. To the user, it will appear that the programs run at the same time (hence the term "parallel"). Shortly thereafter, the notion of a "program" was expanded to the notion of an "executing program and its context". The concept of a process was born, which also became necessary with the invention of re-entrant code . Threads came somewhat later. However, with

720-444: The program code, assigned system resources, physical and logical access permissions, and data structures to initiate, control and coordinate execution activity. Depending on the OS, a process may be made up of multiple threads of execution that execute instructions concurrently . While a computer program is a passive collection of instructions typically stored in a file on disk, a process

750-607: The resources, typically at least the processor state, may be associated with each of the process' threads in operating systems that support threads or child processes. The operating system keeps its processes separate and allocates the resources they need, so that they are less likely to interfere with each other and cause system failures (e.g., deadlock or thrashing ). The operating system may also provide mechanisms for inter-process communication to enable processes to interact in safe and predictable ways. A multitasking operating system may just switch between processes to give

780-487: The run queue is then allowed to execute. Processes are also removed from the run queue when they ask to sleep , are waiting on a resource to become available, or have been terminated. In the Linux operating system (prior to kernel 2.6.23), each CPU in the system is given a run queue, which maintains both an active and expired array of processes. Each array contains 140 (one for each priority level) pointers to doubly linked lists , which in turn reference all processes with

810-409: The scheduler as a whole. For this design to work, each task_struct task descriptor embeds a field of type sched_entity that represents the set of entities the task belongs to. Each per-CPU run-queue of type cfs_rq sorts sched_entity structures in a time-ordered fashion into a red-black tree (or 'rbtree' in Linux lingo), where the leftmost node is occupied by the entity that has received

840-425: The scheduler is invoked to run a new process: If the process spends a lot of its time sleeping, then its spent time value is low and it automatically gets the priority boost when it finally needs it. Hence such tasks do not get less processor time than the tasks that are constantly running. The complexity of the algorithm that inserts nodes into the cfs_rq runqueue of the CFS scheduler is O ( log N ), where N

870-428: The simple pressing of a key or when moving a mouse. Furthermore, applications like video and music reproduction are given some kind of real-time priority, preempting any other lower priority process. In time-sharing systems, context switches are performed rapidly, which makes it seem like multiple processes are being executed simultaneously on the same processor. This seemingly-simultaneous execution of multiple processes

900-626: Was still neither cheap nor fully utilized; such an environment made multiprogramming possible and necessary. Multiprogramming means that several programs run concurrently . At first, more than one program ran on a single processor, as a result of underlying uniprocessor computer architecture, and they shared scarce and limited hardware resources; consequently, the concurrency was of a serial nature. On later systems with multiple processors , multiple programs may run concurrently in parallel . Programs consist of sequences of instructions for processors. A single processor can run only one instruction at

#538461