Homework
    E6118 Fall 2025
    
    DUE: Friday 9/26/2025 at 11:59pm EDT
    Overview:
    
      In this assignment, you will use a large language model (LLM) of your choice to build a Linux scheduler, such as ChatGPT, Claude, and DeepSeek, to name but a few.
      You will guide the LLM with prompts to create the scheduler, but you are not allowed to provide the LLM directly with any code that will go in your scheduler implementation.  All of the code for the scheduler implementation must be generated by the LLM.  For example, if the LLM produces buggy code, you may not provide the LLM with the correct code that it should implement instead. However, as part of your prompts, you are free to provide the LLM with code that already exists in the Linux kernel or that the LLM previously generated.  You are also free to write your own test code to test the correctness and functionality of what the LLM generates.
    
      You may work individually or in groups of 2 of your choosing, and should select your LLM of choice in advance.
      Sign up for your chosen LLM here.
      To ensure variety, no more than three groups should use the same LLM.
    
    
      You should submit the following via email to e6118@lists.cs.columbia.edu:
      
        - 
          Implementation patches for your scheduler, formatted in the style of Linux kernel mailing list submissions
        
 
        - 
          A PDF report describing the generated code, your prompting strategy, evaluation results, and experiences, including your evaluation of the effectiveness of the LLM
        
 
        - 
          The complete LLM chat log showing how the code was generated
        
 
      
    
    
      Your scheduler must be implemented on the 
      Linux 6.14 kernel. 
      Run it on Ubuntu 25.04 using VMWare. The implementation should work on either x86 or Arm64. 
      Follow the VM setup instructions here, 
      and refer to this guide if you need a refresher on compiling and booting a new kernel.
    
    Programming Problems:
    
      
        The goal of this assignment is to create a new Linux scheduler, a Weight
        Fair Queuing scheduler.
      
    
    
      - 
         
        A Multicore Scheduler
        
          Add a new scheduling policy to the Linux kernel to support
          weight fair queuing 
          scheduling. Call this policy
          WFQ. The algorithm should work as follows:
          
            - 
              The new scheduling policy should serve as the default scheduling
              policy for all tasks.
            
 
            - Multicore systems must be fully supported.
 
            - Every task has a 1 tick time slice (quantum).
 
            - 
              Every task has an assigned weight (task_weight), with the default
              weight being 10.
            
 
            - 
              When deciding which CPU a task should be assigned to, it should be
              assigned to the CPU with the least total weight. This means you
              have to keep track of the total weight of all the tasks running on
              each CPU (cpu_total_weight). If there are multiple CPUs with the
              same weight, assign the task to the lowest number CPU among CPUs
              with the same weight.
            
 
            - 
              Each CPU has a virtual time which starts at zero. When a task runs
              on that CPU for some time t, the CPU's virtual time increases by t
              / cpu_total_weight.
            
 
            - 
              Each task has a virtual time which is initialized to the CPU
              virtual time when it is assigned to a CPU. When a task runs for
              some time t, the task's virtual time increases by t / task_weight.
            
 
            - 
              Each task has a virtual finishing time (VFT) which is equal to the
              task's virtual time plus (duration of 1 tick) / task_weight. The VFT is essentially
              what the value of a task's virtual time will be after it runs for
              an additional time slice.
            
 
            - 
              Each CPU will select the task to run from its run queue that has
              the smallest VFT.
            
 
            - 
              The kernel does not support floating point arithmetic, so all
              calculations, particularly those involving virtual time, should be
              done using 64-bit integers and scaled by some scale factor to
              avoid rounding error.
            
 
          
          - 
            The Linux scheduler implements individual scheduling classes
            corresponding to different scheduling policies. For this assignment,
            you need to create a new scheduling class,
            sched_wfq_class, for the
            WFQ policy, and implement the necessary
            functions in kernel/sched/wfq.c. You can
            find some good examples of how to create a scheduling class in
            kernel/sched/rt.c and
            kernel/sched/fair.c. Other interesting
            files that will help you understand how the Linux scheduler works
            are kernel/sched/core.c,
            include/linux/sched.h, and
            /include/asm-generic/vmlinux.lds.h. While
            there is a fair amount of code in these files, a key goal of this
            assignment is for you to understand how to abstract the scheduler
            code so that you learn in detail the parts of the scheduler that are
            crucial for this assignment and ignore the parts that are not.
          
 
          - 
            Your scheduler should operate alongside the existing Linux
            scheduler. Therefore, you should add a new scheduling policy,
            SCHED_WFQ. The value of
            SCHED_WFQ should be 8.
            SCHED_WFQ should be made the default
            scheduler class of init_task.
          
 
          - 
            Only tasks whose policy is set to
            SCHED_WFQ should be considered for
            selection by your new scheduler.
          
 
          - 
            Tasks using the SCHED_WFQ policy should
            take priority over tasks using the
            SCHED_NORMAL policy, but not over
            tasks using the SCHED_RR or
            SCHED_FIFO policies.
          
 
          - 
            Your scheduler must be capable of working on both uniprocessor
            systems and multicore/multiprocessor systems; you can change the
            number of cores on your VM for testing. All cores should be utilized
            on multiprocessor systems.
          
 
          - 
            Proper synchronization and locking is crucial for a multicore
            scheduler, but not easy. Pay close attention to the kind of locking
            used in existing kernel schedulers. This is extremely critical for
            the next part of the assignment. It is very easy to overlook a key
            synchronization aspect and cause a deadlock!
          
 
          - 
            You will want to refrain from immediately making your scheduler the
            default scheduler for init_task and
            instead, test by manually configuring tasks to use your policy with
            sched_setscheduler().
          
 
          - 
            For a more responsive system, you may want to set the scheduler of
            kernel threads to be SCHED_WFQ as well
            (otherwise, SCHED_WFQ tasks can starve the
            SCHED_NORMAL tasks to a degree). To do
            this, you can modify kernel/kthread.c and
            replace SCHED_NORMAL with
            SCHED_WFQ. It is strongly suggested that
            you do this to ensure that your VM is responsive enough for the test
            cases.
          
 
          - 
            Check out the debugging tips provided
            below.
          
 
        
       
      - 
          Getting WFQ info
        To make it easier to monitor the status of your scheduler, you will
        implement the following system calls:
       
      #define MAX_CPUS 8 /* We will be testing only on the VMs */
struct wfq_info {
	int num_cpus;
	int nr_running[MAX_CPUS];
	int total_weight[MAX_CPUS];
};
/* This system call will fill the buffer with the required values 
 * i.e. the total number of CPUs, the number of WFQ processes on each 
 * CPU and the total weight of WFQ processes on each CPU. 
 * Return value will also be the total number of CPUs.
 * System call number 467.
 */ 
int get_wfq_info(struct wfq_info *wfq_info);
/* This system call will change the weight for the calling process.
 * Only a root user should be able to increase the weight beyond
 * the default value of 10.  The system call should return an error
 * for any weight less than 1. 
 * System call number 468.
 */ 
int set_wfq_weight(int weight);
      The two system calls should be implemented in
      kernel/sched/core.c.
      
      - 
         
        Add load-balancing features to WFQ
        You should have a working scheduler by now. Make sure you test it before
        you move on to this part. So far your scheduler will run a task only on
        the CPU that it was originally assigned. Let's change that now! For this
        part you will be implementing two balancing features. The first will be
        a periodic load balancing where you will try to move a task from the
        heaviest CPU to the lightest one and the second is called idle balance
        which is when a CPU will attempt to steal a task from another CPU when
        it doesn't have any tasks to run i.e. when its runqueue is empty.
        
          Load balancing is a key feature of the default Linux scheduler
          (Completely Fair Scheduler). While CFS follows a rather complicated
          policy to balance tasks and works to move more than one task in a
          single go, your scheduler will have a simple policy in this regard.
          Note while the spec and the Linux kernel refer to default scheduler
          as the CFS, as of kernel version 6.6 the CFS was merged with the
          Earliest Eligible Virtual Deadline First (EEVDF) scheduler.
        
        
        
        Periodic load balancing works as follows:
        
          - 
            In periodic load balancing, a single job from the run queue with the
            highest total weight should be moved to the run queue with the
            lowest total weight.
          
 
          - 
            The job that should be moved is the first eligible job in the run
            queue which can be moved without causing the imbalance to reverse.
          
 
          - 
            Jobs that are currently running are not eligible to be moved and
            some jobs may have restrictions on which CPU they can be run on.
          
 
          - 
            Periodic load balancing should be attempted every 500ms. For
            example, if CPU 2 performs periodic load balancing (e.g. moves a
            task from CPU 4 to CPU 1), no CPU should perform periodic load
            balancing for the next 500ms.
          
 
          - 
            You may assume for load balancing that the CPUs receive periodic
            timer interrupts.
          
 
          - 
            While there is a quite a bit of code in the CFS implementation, look
            in particular for the functions that determine whether a task is
            eligible to moved to another core. Don't expect to move a task every
            periodic load balancing cycle.
          
 
        
        
        Idle balancing works as follows:
        
          - 
            Idle balance is when an idle CPU (i.e. a CPU with an empty runqueue)
            pulls one task from another CPU.
          
 
          - 
            Again, take a look at the CFS implementation to figure out where
            this is taking place and how it is called from
            kernel/sched/core.c.
          
 
          - 
            The CPU that you pull a task from should have at least two tasks on
            its runqueue and have the greatest total weight, although the
            measure of greatest total weight can be an estimate and can reflect
            measurements at somewhat different times on different CPUs.
          
 
          - 
            You should again ensure that the task you are stealing is eligible
            to be moved and allowed to run on the idle CPU.
          
 
        
        
       
      - 
         
        Investigate and Demo
        Demonstrate that your scheduler effectively balances weights across all
        CPUs. The total weight of the processes running on each CPU should be
        roughly the same. You should start with a CPU-bound test program such as
        a while loop then try running more complex programs that do I/O to show
        that the scheduler continues to operate correctly. You should include a
        test program test/test.c that calls the system
        calls. You should also include a
        test/Makefile that builds the program in the same
        directory resulting in an executable named test.
	Experiment with different weights and see what impact they
        have.
        
          You should provide a complete set of results that describe your
          experiments. Your results and any explanations should be put into the
          PDF report. The writeup should be of
          sufficient detail for someone else to be able to repeat your
          experiments.
        
       
    
    
      Debugging Tips
    
    
      The following are some debugging tips you may find useful as you create
      your new scheduling class.
    
    
      - 
        Before testing, take a snapshot of your VM if you have not done so
        already. That way, if your kernel crashes or is unresponsive because of
        scheduler malfunction, you can restore the state of the VM prior to the
        problem.
      
 
      - 
        It is possible that your kernel gets stuck at boot time, preventing you
        from reading debug messages via dmesg. In this
        scenario, you may find it helpful to redirect your debug messages to a
        file on your host machine.
        
          - Right click on your VM and click "Settings"
 
          - Under "Hardware", click "Add" and create a Serial Port.
 
          - Under "Device", click on your new Serial Port.
 
          - 
            Click on "Use output file", and specify the file on your host
            machine to which your would like to dump the kernel log.
          
 
          - 
            Turn on your VM, move to your kernel in the GRUB menu, and press
            e.
          
 
          - 
            Move toward the bottom until you find a line that looks like (linux   /boot/vmlinuz-6.14.0-...”).
          
 
          - 
            At the end of this line, replace
            quiet with
            console=ttyS0 (try
            console=ttyS1 if this doesn't work).
          
 
          - 
            Hit F10 to boot your kernel. The kernel
            log should be written to the file on your host machine you specified
            earlier.
          
 
        
       
      - 
	Although Linux in theory allows you to
	define new scheduling classes, Linux makes
	assumptions in various places that the only scheduling classes
	are the ones that have already been defined. You will need to
	change those places in the kernel code to allow your kernel to
	boot with your scheduling class, allow you to assign tasks
	to your scheduling class, and allow your scheduling class
	to be used to pick the next task to run.
	
 
      - 
        You may find it helpful to use ps,
        top, or
        htop to view information such as the
        CPU usage and scheduling policies of your tasks. 
        These commands rely on execution time statistics reported by
        certain scheduler functions in the kernel.  As a result,
	bugs in your scheduling class could cause these commands to
        report inaccurate information.  This cuts two ways.  First,
	it is possible that your scheduler is working properly in
        terms of selecting the right tasks to execute for the right
        amount of time, but your calculation of execution time
        statistics in the kernel is wrong, so these commands appear to
        report that your tasks are not running at all when in fact
        they are.  Second, it is possible that your scheduler is not
        working properly such that these tools report that tasks are
        using your scheduling class when in fact they are not.
	As a result,
	you should not exclusively rely on these tools to determine
	if your scheduling class is working properly.  For example, when you
	make your scheduling class the default scheduling class, the
	fact that user-level tools claim that all tasks are using your
	scheduling class may not necessarily mean that this is the case.
	Instead, to ensure that your scheduling class functions
	are actually being used, you might add a printk to a function
	like your class's enqueue_task() and
	verify that it appears in dmesg
	output.  Make sure that you do not submit your code
	with such debugging  printk statements as they can cause issues if
	invoked too frequently.