Homework

E6118 Fall 2025

DUE: Friday 9/26/2025 at 11:59pm EDT

Overview:

In this assignment, you will use a large language model (LLM) of your choice to build a Linux scheduler, such as ChatGPT, Claude, and DeepSeek, to name but a few. You will guide the LLM with prompts to create the scheduler, but you are not allowed to provide the LLM directly with any code that will go in your scheduler implementation. All of the code for the scheduler implementation must be generated by the LLM. For example, if the LLM produces buggy code, you may not provide the LLM with the correct code that it should implement instead. However, as part of your prompts, you are free to provide the LLM with code that already exists in the Linux kernel or that the LLM previously generated. You are also free to write your own test code to test the correctness and functionality of what the LLM generates.

You may work individually or in groups of 2 of your choosing, and should select your LLM of choice in advance. Sign up for your chosen LLM here. To ensure variety, no more than three groups should use the same LLM.

You should submit the following via email to e6118@lists.cs.columbia.edu:

Implementation patches for your scheduler, formatted in the style of Linux kernel mailing list submissions
A PDF report describing the generated code, your prompting strategy, evaluation results, and experiences, including your evaluation of the effectiveness of the LLM
The complete LLM chat log showing how the code was generated

Your scheduler must be implemented on the Linux 6.14 kernel. Run it on Ubuntu 25.04 using VMWare. The implementation should work on either x86 or Arm64. Follow the VM setup instructions here, and refer to this guide if you need a refresher on compiling and booting a new kernel.

Programming Problems:

The goal of this assignment is to create a new Linux scheduler, a Weight Fair Queuing scheduler.

A Multicore Scheduler
- The Linux scheduler implements individual scheduling classes corresponding to different scheduling policies. For this assignment, you need to create a new scheduling class, sched_wfq_class, for the WFQ policy, and implement the necessary functions in kernel/sched/wfq.c. You can find some good examples of how to create a scheduling class in kernel/sched/rt.c and kernel/sched/fair.c. Other interesting files that will help you understand how the Linux scheduler works are kernel/sched/core.c, include/linux/sched.h, and /include/asm-generic/vmlinux.lds.h. While there is a fair amount of code in these files, a key goal of this assignment is for you to understand how to abstract the scheduler code so that you learn in detail the parts of the scheduler that are crucial for this assignment and ignore the parts that are not.
- Your scheduler should operate alongside the existing Linux scheduler. Therefore, you should add a new scheduling policy, SCHED_WFQ. The value of SCHED_WFQ should be 8. SCHED_WFQ should be made the default scheduler class of init_task.
- Only tasks whose policy is set to SCHED_WFQ should be considered for selection by your new scheduler.
- Tasks using the SCHED_WFQ policy should take priority over tasks using the SCHED_NORMAL policy, but not over tasks using the SCHED_RR or SCHED_FIFO policies.
- Your scheduler must be capable of working on both uniprocessor systems and multicore/multiprocessor systems; you can change the number of cores on your VM for testing. All cores should be utilized on multiprocessor systems.
- Proper synchronization and locking is crucial for a multicore scheduler, but not easy. Pay close attention to the kind of locking used in existing kernel schedulers. This is extremely critical for the next part of the assignment. It is very easy to overlook a key synchronization aspect and cause a deadlock!
- You will want to refrain from immediately making your scheduler the default scheduler for init_task and instead, test by manually configuring tasks to use your policy with sched_setscheduler().
- For a more responsive system, you may want to set the scheduler of kernel threads to be SCHED_WFQ as well (otherwise, SCHED_WFQ tasks can starve the SCHED_NORMAL tasks to a degree). To do this, you can modify kernel/kthread.c and replace SCHED_NORMAL with SCHED_WFQ. It is strongly suggested that you do this to ensure that your VM is responsive enough for the test cases.
- Check out the debugging tips provided below.
Getting WFQ info
To make it easier to monitor the status of your scheduler, you will implement the following system calls:

#define MAX_CPUS 8 /* We will be testing only on the VMs */
struct wfq_info {
	int num_cpus;
	int nr_running[MAX_CPUS];
	int total_weight[MAX_CPUS];
};

/* This system call will fill the buffer with the required values 
 * i.e. the total number of CPUs, the number of WFQ processes on each 
 * CPU and the total weight of WFQ processes on each CPU. 
 * Return value will also be the total number of CPUs.
 * System call number 467.
 */ 
int get_wfq_info(struct wfq_info *wfq_info);

/* This system call will change the weight for the calling process.
 * Only a root user should be able to increase the weight beyond
 * the default value of 10.  The system call should return an error
 * for any weight less than 1. 
 * System call number 468.
 */ 
int set_wfq_weight(int weight);

kernel/sched/core.c

Add load-balancing features to WFQ
You should have a working scheduler by now. Make sure you test it before you move on to this part. So far your scheduler will run a task only on the CPU that it was originally assigned. Let's change that now! For this part you will be implementing two balancing features. The first will be a periodic load balancing where you will try to move a task from the heaviest CPU to the lightest one and the second is called idle balance which is when a CPU will attempt to steal a task from another CPU when it doesn't have any tasks to run i.e. when its runqueue is empty.
Load balancing is a key feature of the default Linux scheduler (Completely Fair Scheduler). While CFS follows a rather complicated policy to balance tasks and works to move more than one task in a single go, your scheduler will have a simple policy in this regard. Note while the spec and the Linux kernel refer to default scheduler as the CFS, as of kernel version 6.6 the CFS was merged with the Earliest Eligible Virtual Deadline First (EEVDF) scheduler.

Periodic load balancing works as follows:
- In periodic load balancing, a single job from the run queue with the highest total weight should be moved to the run queue with the lowest total weight.
- The job that should be moved is the first eligible job in the run queue which can be moved without causing the imbalance to reverse.
- Jobs that are currently running are not eligible to be moved and some jobs may have restrictions on which CPU they can be run on.
- Periodic load balancing should be attempted every 500ms. For example, if CPU 2 performs periodic load balancing (e.g. moves a task from CPU 4 to CPU 1), no CPU should perform periodic load balancing for the next 500ms.
- You may assume for load balancing that the CPUs receive periodic timer interrupts.
- While there is a quite a bit of code in the CFS implementation, look in particular for the functions that determine whether a task is eligible to moved to another core. Don't expect to move a task every periodic load balancing cycle.
Idle balancing works as follows:
- Idle balance is when an idle CPU (i.e. a CPU with an empty runqueue) pulls one task from another CPU.
- Again, take a look at the CFS implementation to figure out where this is taking place and how it is called from kernel/sched/core.c.
- The CPU that you pull a task from should have at least two tasks on its runqueue and have the greatest total weight, although the measure of greatest total weight can be an estimate and can reflect measurements at somewhat different times on different CPUs.
- You should again ensure that the task you are stealing is eligible to be moved and allowed to run on the idle CPU.
Investigate and Demo
Demonstrate that your scheduler effectively balances weights across all CPUs. The total weight of the processes running on each CPU should be roughly the same. You should start with a CPU-bound test program such as a while loop then try running more complex programs that do I/O to show that the scheduler continues to operate correctly. You should include a test program test/test.c that calls the system calls. You should also include a test/Makefile that builds the program in the same directory resulting in an executable named test. Experiment with different weights and see what impact they have.
You should provide a complete set of results that describe your experiments. Your results and any explanations should be put into the PDF report. The writeup should be of sufficient detail for someone else to be able to repeat your experiments.

Debugging Tips

The following are some debugging tips you may find useful as you create your new scheduling class.

Before testing, take a snapshot of your VM if you have not done so already. That way, if your kernel crashes or is unresponsive because of scheduler malfunction, you can restore the state of the VM prior to the problem.
It is possible that your kernel gets stuck at boot time, preventing you from reading debug messages via dmesg. In this scenario, you may find it helpful to redirect your debug messages to a file on your host machine.
1. Right click on your VM and click "Settings"
2. Under "Hardware", click "Add" and create a Serial Port.
3. Under "Device", click on your new Serial Port.
4. Click on "Use output file", and specify the file on your host machine to which your would like to dump the kernel log.
5. Turn on your VM, move to your kernel in the GRUB menu, and press e.
6. Move toward the bottom until you find a line that looks like (linux /boot/vmlinuz-6.14.0-...”).
7. At the end of this line, replace quiet with console=ttyS0 (try console=ttyS1 if this doesn't work).
8. Hit F10 to boot your kernel. The kernel log should be written to the file on your host machine you specified earlier.
Although Linux in theory allows you to define new scheduling classes, Linux makes assumptions in various places that the only scheduling classes are the ones that have already been defined. You will need to change those places in the kernel code to allow your kernel to boot with your scheduling class, allow you to assign tasks to your scheduling class, and allow your scheduling class to be used to pick the next task to run.
You may find it helpful to use ps, top, or htop to view information such as the CPU usage and scheduling policies of your tasks. These commands rely on execution time statistics reported by certain scheduler functions in the kernel. As a result, bugs in your scheduling class could cause these commands to report inaccurate information. This cuts two ways. First, it is possible that your scheduler is working properly in terms of selecting the right tasks to execute for the right amount of time, but your calculation of execution time statistics in the kernel is wrong, so these commands appear to report that your tasks are not running at all when in fact they are. Second, it is possible that your scheduler is not working properly such that these tools report that tasks are using your scheduling class when in fact they are not.
As a result, you should not exclusively rely on these tools to determine if your scheduling class is working properly. For example, when you make your scheduling class the default scheduling class, the fact that user-level tools claim that all tasks are using your scheduling class may not necessarily mean that this is the case. Instead, to ensure that your scheduling class functions are actually being used, you might add a printk to a function like your class's enqueue_task() and verify that it appears in dmesg output. Make sure that you do not submit your code with such debugging printk statements as they can cause issues if invoked too frequently.