Kernel 2.4 - Major Features

Kernel 2.6 - Major Features

Concrete view of Linux Kernel Scheduler

Linux Source Code Information

Project Description / Challenges

How to Configure and Build a Linux Kernel

Linux Scheduler Implementation

Summary of the Project

Conclusion and future work







The main purpose of the project Scheduling in Linux is adding a scheduling policy to the Linux Kernel 2.4. It also aims at providing a clear yet concrete oversiew of the scheduling process in Linux. We will start out presentation with a detailed introduction followed by some basic yet important questions regarding , how the schedular works , the code implementation and important algorithms. We will also present an overview of how a linux kernel can be compiled and configured.


The Linux scheduler is a priority based scheduler that schedules tasks based upon their static and dynamic priorities. When these priorities are combined they form a task's goodness . Each time the Linux scheduler runs, every task on the run queue is examined and its goodness value is computed. The task with the highest goodness is chosen to run next.

When there are cpu bound tasks running in the system, the Linux scheduler may not be called for intervals of up to .40 seconds. This means that the currently running task has the CPU to itself for periods of up to .40 seconds (how long depends upon the task's priority and whether it blocks or not). This is good for throughput because there are few computationally uneccessary context switches. However it can kill interactivity because Linux only reschedules when a task blocks or when the task's dynamic priority (counter) reaches zero. Thus under Linux's default priority based scheduling method, long scheduling latencies can occur.

Looking at the scheduling latency in finer detail, the Linux scheduler makes use of a timer that interrupts every 10 msec. This timer erodes the currently running task's dynamic priority (decrements its counter). A task's counter starts out at the same value its priority contains. Once its dynamic priority (counter) has eroded to 0 it is again reset to that of its static priority (priority). It is only after the counter reaches 0 that a call to schedule() is made. Thus a task with the default priority of 20 may run for .200 secs (200 msecs) before any other task in the system gets a chance to run. A task at priority 40 (the highest priority allowed) can run for .400 secs without any scheduling occurring as long as it doesn't block or yield.

Linux scheduler has been gone through some big improvements since kernel version 2.4. There were a lot of complaints about the interactivity of the scheduler in kernel 2.4. During this version, the scheduler was implemented with one running queue for all available processors. At every scheduling, this queue was locked and every task on this queue got its timeslice update. This implementation caused poor performance in all aspects. The scheduler algorithm and supporting code went through a large rewrite early in the 2.5 kernel development series. The new scheduler was arisen to achieveO(1 ) run-time regardless number of runnable tasks in the system. To achieve this, each processor has its own running queue. This helps a lot in reducing lock contention. The priority array was introduced which used active array and expired array to keep track running tasks in the system. TheO(1 ) running time is primarily drawn from this new data structure. The scheduler puts all expired processes into expired array. When there is no active process available in active array, it swaps active array with expired array, which makes active array becomes expired array and expired array becomes active array. There were some twists made into this scheduler to optimize further by putting expired task back to active array instead of expired array in some cases.O(1 ) scheduler uses a heuristic calculation to update dynamic priority of tasks based on their interactivity (I/O bound versus CPU bound) The industry was happy with this new scheduler until Con Kolivas introduced his new scheduler named Rotating Staircase Deadline (RSDL) and then later Staircase Deadline (SD). His new schedulers proved the fact that fair scheduling among processes can be achieved without any complex computation. His scheduler was designed to run inO(n ) but its performance exceeded the currentO(1 ) scheduler.

The result achieved from SD scheduler surprised all kernel developers and designers. The fair scheduling approach in SD scheduler encouraged Igno Molnar to re-implement the new Linux scheduler named Completely Fair Scheduler (CFS). CFS scheduler was a big improvement over the existing scheduler not only in its performance and interactivity but also in simplifying the scheduling logic and putting more modularized code into the scheduler. CFS scheduler was merged into mainline version 2.6.23. Since then, there have been some minor improvements made to CFS scheduler in some areas such as optimization, load balancing and group scheduling feature.


Kernel 2.4 Major Features


  • An O(n) scheduler - Goes through the entire “ global runqueue” to determine the next task to be run. This is an O(n) algorithm where 'n' is the number of processes. The time taken was proportional to the number of active processes in the system


  • A Global runqueue - All CPUs had to wait for other CPUs to finish execution. A Global runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load balancing but bad for memory caches. For example, suppose a task executed on CPU-1, and its data was in that processor's cache. If the task got rescheduled to CPU-2, its data would need to be invalidated in CPU-1 and brought into CPU-2 .


  • This lead to large performance hits during heavy workload


Kernel 2.4 Scheduler Policies:

  • SCHED_FIFO - A First-In, First-Out real-time process. When the scheduler assigns the CPU to the process, it leaves the process descriptor in its current position in the runqueue list. If no other higher-priority realtime process is runnable, the process will continue to use the CPU as long as it wishes, even if other real-time processes having the same priority are runnable


  • SCHED_RR - A Round Robin real-time process. When the scheduler assigns the CPU to the process, it puts the process descriptor at the end of the runqueue list. This policy ensures a fair assignment of CPU time to all SCHED_RR real-time processes that have the same priority


  • SCHED_OTHER - A conventional, time-shared process. The policy field also encodes a SCHED_YIELD binary flag. This flag is set when the process invokes the sched_ yield( ) system call (a way of voluntarily relinquishing the processor without the need to start an I/O operation or go to sleep. The scheduler puts the process descriptor at the bottom of the runqueue list.


O(1) Algorithm ( Constant time algorithm )

  • Choose the task on the highest priority list to execute

  • To make this process more efficient, a bitmap is used to define when tasks are on a given priority list

  • On most architectures, a find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities

  • The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities

  • This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of the number of active tasks


Kernel 2.6 - Major Features




Kernel 2.6 Scheduler Policies:

  1. Each task assigned a “Nice” value
  2. PRIO = MAX_RT_PRIO + NICE + 20
  3. Assigned a time slice
  4. Tasks at the same prio(rity) are round-robined
  5. Ensures Priority + Fairness


  1. Run until they relinquish the CPU voluntarily
  2. Priority levels maintained
  3. Not pre-empted !!


  1. Assigned a timeslice and run till the timeslice is exhausted.
  2. Once all RR tasks of a given prio(rity) level exhaust their timeslices, their timeslices are refilled and they continue running
  3. Prio(rity) levels are maintained


  1. For computing-intensive tasks
  2. Timeslices are long and processes are round robin scheduled
  3. lowest priority tasks are batch-processed (nice +19)


  1. nice value has no influence for this policy
  2. extremely low priority (lower than +19 nice)


Completely Fair Scheduler (CFS)



Concrete view of Linux Kernel Scheduler

For a general overview of the features of linux scheduers click here ->

Overview ( typical Kernel 2.6 Scheduler )



Linux scheduler contains:

  • A Running Queue : A running queue (rq) is created for each processor (CPU). It is defined in kernel/sched.c as struct _runqueue. Each rq contains a list of runnable processes on a given processor. Thestruct_runqueue is defined in sched.c notsched.h to abstract the internal data structure of the scheduler.


  • Schedule Class : schedule class was introduced in 2.6.23. It is an extensible hierarchy of scheduler modules. These modules encapsulate scheduling policy details and are called from the scheduler core without the core code assuming too much about them. Scheduling classes are implemented through thesched_class structure, which contains hooks to functions that must be called whenever an interesting event occurs. Tasks refer to their schedule policy through struct task_struct and sched_class. There are two schedule classes implemented in 2.6.32:


  1. Completely Fair Schedule class: schedules tasks following Completely Fair Scheduler (CFS) algorithm. Tasks which have policy set to SCHED_ NORMA L (SCHED_OTHER), SCHED_BATCH, SCHED_IDLE are scheduled by this schedule class. The implementation of this class is in kernel /sched_fai r.c
  2. RT schedule class: schedules tasks following real-time mechanism defined in POSIX standard. Tasks which have policy set to SCHED_FIFO, SCHED_RR are scheduled using this schedule class. The implementation of this class is kernel/sched_rt.c


  • Load balancer: In SMP environment, each CPU has its own rq. These queues might be unbalanced from time to time. A running queue with empty task pushes its associated CPU to idle, which does not take full advantage of symmetric multiprocessor systems. Load balancer is to address this issue. It is called every time the system requires scheduling tasks. If running queues are unbalanced, load balancer will try to pull idle tasks from busiest processors to idle processor.


Interactivity is an important goal for the Linux scheduler, especially given the growing effort to optimize Linux for desktop environments. Interactivity often flies in the face of efficiency, but it is very important nonetheless. An example of interactivity might be a keystroke or mouse click. Such events usually require a quick response (i.e. the thread handling them should be allowed to execute very soon) because users will probably notice and be annoyed if they do not see some result from their action almost immediately. Users don’t expect a quick response when, for example, they are compiling programs or rendering high-resolution images. They are unlikely to notice if something like compiling the Linux kernel takes an extra twenty seconds. Schedulers used for interactive computing should be designed in such a way that they respond to user interaction within a certain time period. Ideally, this should be a time period that is imperceptible to users and thus gives the impression of an immediate response.

Interactivity estimator

  • Dynamically scales a tasks priority based on it's interactivity

  • Interactive tasks receive a prio bonus

  • Hence a larger timeslice

  • CPU bound tasks receive a prio penalty

  • Interactivity estimated using a running sleep average.

  • Interactive tasks are I/O bound. They wait for events to occur.

  • Sleeping tasks are I/O bound or interactive !!

  • Actual bonus/penalty is determined by comparing the sleep average against a constant maximum sleep average.

  • Does not apply to RT tasks

When a task finishes it's timeslice :

  • It's interactivity is estimated

  • Interactive tasks can be inserted into the 'Active' array again

  • Else, priority is recalculated

  • Inserted into the NEW priority level in the 'Expired' array

Re-inserting interactive tasks

  • To avoid delays, interactive tasks may be re-inserted into the 'active' array after their timeslice has expired

  • Done only if tasks in the 'expired' array have run recently

  • Done to prevent starvation of tasks

  • Decision to re-insert depends on the task's priority level

Timeslice distribution:

  • Priority is recalculated only after expiring a timeslice

  • Interactive tasks may become non-interactive during their LARGE timeslices, thus starving other processes

  • To prevent this, time-slices are divided into chunks of 20ms

  • A task of equal priority may preempt the running task every 20ms

  • The preempted task is requeued and is round-robined in it's priority level.

  • Also, priority recalculation happens every 20ms


Benchmark - Hackbench Performance Runs

Performance tests were conducted using a test program called hackbench. The tests were run to compare Linux 2.4.18 against the 2.6.0-test9 kernel.

What is hackbench?

The hackbench test is a benchmark for measuring the performance, overhead, and scalability of the Linux scheduler. Created by Rusty Russell, it uses client and server processes grouped to send and receive data in order to simulate the connections established for a chat room. Each client sends a message to each server in the group.

The test creates a group of processes in multiples set by the user (for this report, they were set in multiples of 25). Each client/server pair listens on a socket; the writer writes 100 messages to each socket and the receiver listens on the socket. Therefore in this case, the total number of messages sent are 100 X the number of processes specified. The source for the test file is hackbench.c  and is called from a wrapper perl script which is executed in OSDL's Scalable Test Platform.   The wrapper script is

Results of the Runs

Each individual test runs a multiple of 25 processes, increments to the next multiple and reruns the benchmark. This continues until a max level, set by the tester, is achieved.




Linux Source Code Information


Source Code for almost al versions of Linux are available from - >


Project Description / Challenges




How to Configure and Build a Linux Kernel


Step 1 : Get the necessary requirements for Building and Using the Kernel

Only three packages that are absoulutely necessary for successfully building a kernel:

Compiler : To build the kernel, the gcc C compiler must be used. Most Linux distributions have a package entitiled gcc that should be installed. If you wish to download the compiler and build it yourself, you can find it at of the 2.6.18 kernel release, the 3.2 version of gcc is the oldest that can properly build a working kernel. 
$ gcc –version

Linker : The C compiler, gcc, does not do all of the compiling on its own. It needs an additional set of tools known as binutils to do the linking and assembling of source files. The binutils package also contains useful utilities that can manipulate object files in lots of useful ways, such as to view the contents of a library. As of the 2.6.18 kernel release, the 2.12 release of binutils is the oldest that can successfully link the kernel.
$ ld –v

Make:   make is a tool that walks the kernel source tree to determine which files need to be compiled, and then calls the compiler and other build tools to do the work in building the kernel. As of the 2.6.18 kernel release, the 3.79.1 release of make is the oldest that can properly build the kernel. It is recommended that you install the latest stable  version of make.
$ make –version

Other packages :
util-linux , Filesystem-Specific Tools  util packages – JFS , ReiserFS , XFS , Quota , NFS, Udev , process tools – procps package , PCMCIA tools


Step 2 : Retrieving the Kernel Source

Visit and download the latest source code. It is also possible to download the kernel source from the command line, using the wget or curl utilities, both ofwhich should come with your Linux distribution.

$ wget


$ curl \ -o linux-


Step 3 : Configuring and Building

The most basic method of configuring a kernel is to use the make config method :

$ cd linux-

$ make config

default configuration Option :

$ make defconfig

Modifying the configuration :The menuconfig way of configuring a kernel is a console-based program that offers a way to move around the kernel configuration using the arrow keys on the
keyboard. There are three ways of doing so -

$ make menuconfig


Step 4: Compiling/Building The Kernel

$ make

Start compiling to kernel modules:

$ make modules

Install kernel modules (become a root user, use su command):

$ sudo make modules_install

After the modules have been successfully installed, the main kernel image must be

$ make install

It will install three files into /boot directory as well as modification to your kernel grub configuration file:

Create an initrd image

Type the following command at a shell prompt:

$ cd /boot

$ mkinitrd -o initrd.image -2.617.8

Step 5: Installing and Booting from a Kernel

Modifying the Bootloader for the New kernel

There are two common Linux kernel bootloaders: GRUB and LILO.To determine which bootloader your system uses, look in the /boot/ directory. If there is a grub subdirectory:
$ ls -F /boot | grep grub

will give

If this directory is not present, look for the presence of the /etc/lilo.conf file:
$ ls /etc/lilo.conf

should give


If this is present, you are using the LILO program to boot with.

To let GRUB know that a new kernel is present, all you need to do is modify the /boot/grub/menu.lst file. For full details on the structure of this file, and all of the different options available, please see the GRUB info pages:

$ info grub


$ man lilo

Step 6 : Modify grub configuration files - /boot/grub/menu.lst

$ update -grub

The last step is to Reboot computer and boot into your new kernel

Enjoy !


Linux Scheduler Implementation


Kernel Files modified :

sched.c , sched.h ( see the attached files here -> )

Important Modifications :

Probable Evaluation Technique for the new scheduler

A CPU intensive program called counter can be used which counts a large number of random numbers. counter can be used to check how does it performs with the new scheduler process and can be compared with the previous scheduled processes.


Summary of the Project

At first we gave an indept overview of the scheduler in general. Then we discuss in detail the recently used scheduling policies. We also documented the Kernel features for both 2.4 and 2.6 version , so that the readers can distinguish between the old version and the new version of the kernel and guess the developement/updates in this area. We then give a general oversview of scheduler process. We also provide the readers with a benchmark system called Hackbench Benchmark System. The results of the commonly performed test of this benckmark is also discussed.

Then we go over to discuss one most important information for the readers - how to configure and compile a kernel. We first provide with a step by step procedure which can even be used by a naive reader. Then we discuss how the sheduler module can be changed / modified for implemeting a new scheduler policy. We also discuss a probable evaluation technique for the implementation of the new scheduler policy. At last we give the summary of the project and discuss the conclusion and future work of this project.

References are also provided for the benifit of the readers.


Conclusion and future work

This project covered most of the important aspects of Linux scheduler 2.4 and 2.6 . Kernel scheduler is one of the most frequently executed components in Linux system. Hence, it has gained a lot of attentions from kernel developers who have thrived to put the most optimized algorithms and codes into the scheduler. Different algorithms used in kernel scheduler were discussed in the project. CFS scheduler achieves a good performance and responsiveness while being relatively simple compared with the previous algorithm like O(1). CFS exceeds performance expectation in some workloads. But it still shows some weakness in other workloads. There are some complaints about irresponsiveness of CFS scheduler in 3D game area.

The future work includes delving deeper in to the scheduling and process codes in a way so that we can implement a new scheduling algorithm in the kernel. Though this project gives a vivid overview and basic steps of configuring and compiling kernel, implementing scheduling policy like the SCHED_IDLE (with a lower priority) , there were some challenges associated with it. One of the challenges were interpreting the change in the scheduling policy through the process runtime. The goal for the future is to such challenges and develope efficient techniques for kernel scheduling.




[1] A STUDY ON LINUX KERNEL SCHEDULER Version 2.6.32 Author: Thang Minh Le

[3] Multiprocessing with the Completely Fair Scheduler . Introducing the CFS for Linux. by
Avinesh Kumar

[4] Linux Scheduler Latencyby Clark Williams, Red Hat, Inc.


[6] Enhancing Linux Scheduler Scalability by Mike Kravetz

[7] Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism by THOMAS E. ANDERSON, BRIAN N. BERSHAD, EDWARD D. LAZOWSKA, and HENRY M. LEVY


[9] Linux kernel in a nutshell - Greg Kroah-Hartman, published by O'Reilly.