Homework 5
Homework 5 is due Monday 5/6 at 12:01 AM ET.
All written problems in this assignment are to be done alone, while the programming assignment is a group assignment. Submit the written and programming assignments separately using the instructions on the homeworks page. Periodically check back on the course page for any errata or updates to submission instructions.
Written Assignment (40 pts)
Exercise numbers X.Y refer to the exercise problem Y in Chapter X in the course textbook, Operating Systems Concepts 9e. Each problem is worth 5 points unless stated otherwise. Make your answers concise. You'll lose points for verbosity. Your written assignment directory should contain a single ASCII file called hw5.txt with your solutions and a completed teammate evaluation form for the programming assignment.
10.12, 10.13, 10.14 modified (10 points). In this problem, we're going to begin to see how disk geometry impacts I/O scheduling. Suppose that a disk drive has 5,000 cylinders, numbered 0 to 4999.
The drive is currently serving a request at cylinder 2150, and the previous request was at cylinder 1805. The queue of pending requests, in FIFO order, is: 2069, 1212, 2296, 2800, 544, 1618, 356, 1523, 4965, 3681. Starting from the current head position, what is the total distance (in cylinders) that the disk arm moves to satisfy all the pending requests for each of the following disk-scheduling algorithms? a) FCFS, b) SSTF, c) LOOK (Elevator), and d) C-LOOK (Elevator).
During a seek, the disk accelerates the disk arm at a constant rate for half the seek, and then decelerates it at the same rate for the second half. Given that the relation between distance d travelled by a body undergoing constant acceleration a in time t is , compute the relation for seek time as a function of seek distance in cylinders. Assume that the disk can perform a seek to an adjacent cylinder in 1ms and a full-stroke seek over all 5000 cylinders in 18 ms. The relation should be in the form where t is the time in ms and L is the seek distance in cylinders.
Calculate the total seek time for each of the schedules in Part 1 of this problem. Determine which schedule has the smallest total seek time.
Assume that the disk rotates at 7200 RPM. What seek distance (in cylinders) can be covered in the time of a single average rotation (half rotation)? This number will give you a sense of the cylinder range over which it makes more sense for the disk controller to do the scheduling as opposed to the OS's more coarse-grained scheduling policies.
11.19. Read section 11.5.3 of the textbook to understand what consistency semantics are. What are the implications of supporting UNIX consistency semantics for shared access for those files that are stored on remote file systems?
12.12 (modified). Consider a system where free space is kept in a free-space list.
- Suppose that the pointer to the free-space list is lost. Can the system recover by reconstructing the free-space list? Explain your answer.
- Suggest a scheme to ensure that the pointer is never lost as a result of power, memory, or disk block failure.
12.13. Some file systems allow disk storage to be allocated at different levels of granularity. For instance, a file system could allocate 4 KB of disk space as a single 4-KB block or as eight 512-byte blocks. How could we take advantage of this flexibility to improve performance? What modifications would have to be made to the free-space management scheme in order to support this feature?
Consider a file system on a disk that has both logical and physical block sizes of 512 bytes. The current file pointer is at offset 4096 (the last block read). For each of the allocation strategies (contiguous, linked, FAT, and indexed), how many physical blocks must be read from the disk if we want to access data at a) offset 2048 and b) offset 8192 next? Assume that all inode information and systemwide file allocation tables are already in memory, and that each inode can store at-most 10 direct block pointers.
10.18 (modified). Consider a RAID Level 5 organization comprising five disks, with the parity for sets of four blocks on four disks stored on the fifth disk. How many blocks are accessed in order to perform the following? What is the write amplification for each scenario? (Write amplification is the ratio of total data and parity writes to the number of data writes generated by the OS).
- A write of one block of data.
- A write of seven continuous blocks of data (starting at a 4 block boundary).
- A write of eight continuous blocks of data (starting at a 4 block boundary).
- A write of eight non-contiguous blocks of data.
- Based on these calculations, what can you say about the suitability of using RAID-5 for SSD drives?
Consider a guest virtual machine (VM) running on a host OS on a laptop. The VM's virtual disk image is stored as a file on the host OS's filesystem.
- If the laptop has a hard disk, which I/O scheduler (FCFS/Noop, SSTF, Elevator) would you pick for the guest OS? host OS? Why?
- For the remaining questions, assume the laptop has an SSD drive. Which I/O scheduler (FCFS/Noop, SSTF, Elevator) would you pick for the guest OS? host OS? Why?
- Would you pick a log-structured FS for the guest OS? host OS? Why or why not?
- Is it useful to have TRIM support in the guest OS? host OS? Why or why not?
Group Programming Assignment: Cloud File System (75 pts)
In this programming assignment, we'll be working with the Android
kernel again using the same emulator environment you constructed for
Homework 2 using these Android installation instructions. In your
checked out git repository, make sure you checkout the
hw5
branch before starting work on this assignment using
git checkout hw5
.
As the deliverable, you will be required to submit a kernel patch
and diff against the hw5_clean
tag in your git repo. The
patch shows us all commits you have made to the kernel during
development, while the diff reflects only your latest code. To produce
the patch and diff files, make sure you've committed all your changes,
and then go to the root of the kernel tree and type git
format-patch hw5_clean.. --stdout > groupN_hw5.patch
where
N
is your group number. This command should create a
single groupN_hw5.patch
file to
submit. Similarly, use the command git diff hw5_clean.. >
groupN_hw5.diff
to create a single diff file to submit. In
addition to these files, you should also include all user mode test
programs: .c, .h, Makefile
files, the executable files
for your test programs, and a README file documenting your files and
code that explains any way in which your solution differs from what
was assigned, and any assumptions you made. In addition, you are
required to also fill out and include this cover sheet to indicate whether your code has
any problems or not, and what attempts you have made to fix them.
The usual coding instructions apply. All code must be written in C
and a Makefile
must be provided with a default target that compiles all user-mode
programs. When using gcc, the -Wall switch must be
used to display warnings. In addition, you must check your kernel
diff against the Linux coding guidelines using the
scripts/checkpatch.pl groupN_hw5.diff
command in the
kernel tree. You can find a tutorial on how to use checkpatch and
general kernel hacking tips here. Before
submitting, make sure you remove any files produced during
compilation (.o files, executable).
Description
Phones and tablets today often ship multiple models with flash-based storage of varying sizes such as 8GB, 16GB, 32GB, etc. However, because flash is expensive compared to disks, one often pays a hefty premium for buying a model that has more than the minimum amount. Nevertheless, consumers buy higher capacity devices anyway because they have lots of media (e.g., pictures, videos, music, etc.), but often end up actively using only a small fraction of it on a regular basis.
In this assignment, we are going to remedy this situation by modifying an Android filesystem such that it can store a device's data partly on the device, and partly on cheaper cloud-based storage to give the user the illusion that their device has "infinite storage" irrespective of the amount of flash storage available. The idea is to use the local filesystem as a cache - when its utilization increases beyond a watermark W, old files that haven't been used in a while will be evicted and stored in a cloud store. If a file that is stored on the cloud store is accessed by the user, it will transparently be fetched (evicting other files if necessary). Metadata, such as directory files and inodes usually take very little space compared to data and allow the user to quickly browse the contents of the filesystem, so they should never be evicted. In a sense, we'll be implementing a swapping subsystem that works between flash and remote disks, rather than between RAM and disk.
To keep things simple, we will modify the Ext2 filesystem rather than Android's more complex default YAFFS2 flash-optimized log-structured filesystem. Also, rather than working at the block level, we will work with files. A file's inode should always remain on the mobile device; when the file is evicted, only its data blocks are stored on the cloud server and deleted from the local filesystem. For simplicity, we will rely on the Clock (second chance) algorithm to determine which files to evict.
Deliverables
Part I. Cloud Server (20 points): Write a cloud server
clfs_server.c
that runs on your host machine (outside the Android emulator) and stores evicted files from your filesystem. Your server should only use the standard libC API (no external libraries). On execution, it listens for TCP connections from the Cloud FS client on port 8888. On receiving a new connection, it should spawn a new thread, and service the request. Requests are of the following format:struct clfs_req { enum { CLFS_PUT, CLFS_GET, CLFS_RM } type; int inode; int size; };
After receiving aCLFS_PUT
request, the server should then expect aunsigned char data[size];
sized data block next. The server should always send anint
status code back to the client on the same socket indicating whether it was able to successfully execute the operation. If the request is of typeCLFS_GET
and there is no error, the server should subsequently send back the variable sizedunsigned char data[size];
data block that was requested. The following status codes are allowed:enum clfs_status { CLFS_OK = 0, /* Success */ CLFS_INVAL = EINVAL, /* Invalid address */ CLFS_ACCESS = EACCES, /* Could not read/write file */ CLFS_ERROR /* Other errors */ }
Run the cloud server on the host machine on which you are running the emulator. The host can always be reached at IP address 10.0.2.2 from within the emulator. The cloud server should store all blocks as soon as they are received in a directory namedclfs_store
in the same directory as the cloud server executable. Each file should be namedinodenum.dat
. If the cloud receives aCLFS_RM
command, it should delete the appropriate file from the store.Part II. Initializing an FS (5 points): Before modifying the Ext2 filesystem, you must first enable support for it by reconfiguring the kernel. Following the instructions for kernel configuration in step 2.7 of the Android installation tutorial, enable Second extended fs support and Ext2 extended attributes from the File systems menu of the kernel configuration tool. You should leave any other Ext2 options besides these unselected. Save your new configuration as
arch/x86/configs/goldfish_ext2_defconfig
and use it to configure your kernel. Next, you must create and mount an instance of the FS in the emulator. The instructions to do this are very similar to the procedure for creating swap space in homework 4.The tools to work with disks and Ext2 filesystems aren't included in the default Android distribution. To get them, we will use
busybox
a minimal and self-contained UNIX-like distribution. First, get a precompiled version of busybox from here. Download the busybox-i686 binary because our emulator runs x86. This will be a self contained binary.Before starting the emulator, edit the avd settings (
android avd
) and enable the "SDCard" option to create a sdcard of size 10MB. This image will be located in~/.android/avd/<name_of_your_avd>.avd/sdcard.img
. Alternatively, you may also create (additional) images yourself using themksdcard
tool in thetools/
subdirectory of the Android SDK. Use the following options:mksdcard -l HW5 10M hw5.img
. If you create your own image, you must ask the emulator to load it by passing a-sdcard <img_name.img>
command line parameter to the emulator. In either case, the image will appear in the Android device as the/dev/block/mmcblk0
block device.Start the emulator and push the
busybox-i686
binary to the emulator:adb push busybox-i686 /data/busybox
andadb shell chmod 777 /data/busybox
From the adb shell, initialize a partition on the sdcard:
- First, if the sdcard gets automounted (check via the
mount
command), unmount it by going to the Settings app, going to the Storage menu, and selecting Unmount SD card. - Run
adb shell
cd /data
./busybox fdisk /dev/block/mmcblk0
- In fdisk, use m to see help. You need to do the
following:
- o (create new partition table)
- n (add a new primary partition 1 starting at cylinder 1 and ending at cylinder 320 - the partition should have 10232 blocks of 512bytes each)
- t (make sure the partition is set to Linux: type 83)
- v (verify the partition table)
- w (write partition table to disk)
- First, if the sdcard gets automounted (check via the
At this point, you should see a
/dev/block/mmcblk0p1
partition. Create an Ext2 filesystem on it with 1k blocks and HW5 label using the command:./busybox mke2fs -b 1024 -L HW5 /dev/block/mmcblk0p1
Mount the filesystem to/mnt/sdcard
using the mount command. Once the filesystem is mounted, print the mount table by calling themount
command with no options.Your deliverables for this part of the homework are the output of the
mke2fs
command and the mount command showing the mount table included in your cover sheet.
Part III. Eviction Mechanism (30 points): Extend the Ext2 filesystem to evict files to the cloud server and to fetch them when needed. Once Ext2 is enabled, implement the following two calls in the file
fs/ext2/ext2_evict.c
./* Evict the file identified by i_node to the cloud server, * freeing its disk blocks and removing any page cache pages. * The call should return when the file is evicted. Besides * the file data pointers, no other metadata, e.g., access time, * size, etc. should be changed. Appropriate errors should * be returned. In particular, the operation should fail if the * inode currently maps to an open file. Lock the inode * appropriately to prevent a file open operation on it while * it is being evicted. */ int ext2_evict(struct inode *i_node); /* Fetch the file specified by i_node from the cloud server. * The function should allocate space for the file on the local * filesystem. No other metadata of the file should be changed. * Lock the inode appropriately to prevent concurrent fetch * operations on the same inode, and return appropriate errors. */ int ext2_fetch(struct inode *i_node);
For the assignment, we do not want to change the physical layout of inodes or the superblock to avoid having to use modified tools to create the Ext2 filesystem. So, you should store data about whether a file is evicted or not using the extended attributes (xattr) API. Add a new xattr namedevicted
whose value is 1 if the file is evicted, and 0 otherwise. If the xattr for a file is missing, then assume its value is 0, and add the xattr to the file. Any space allocated to an evicted file including its direct and indirect blocks must be freed, and any page-cache memory pages that refer to that file must be removed. Modify the Ext2 implementation of the VFSfile_operations.open
function to check whether a file that is being open is evicted or not. If the file is evicted when opened by a program, fetch it from the cloud server by callingext2_fetch
before allowing the open call to proceed.Register and mount your filesystem: Because we are creating a new filesystem, normally we would change the FS
magic number
(found asEXT2_SUPER_MAGIC
ininclude/linux/magic.h
). However, we do not want to change any on-disk structures, so we will leave the magic number unmodified. However, by modifying the option parsing code infs/ext2/super.c
, your FS should accept the following new mount options.srv=server_name_or_ip:port
: the cloud server location. Return from the mount operation with an error (without mounting) if no value for this option is specified.wh=num
: the high watermark. Should be a number from 0 to 100. It's default value is 95. When the utilization of the FS exceeds wh percent, any new write call should result in running of an eviction operation (see next section) before the write is allowed to proceed.wl=num
: the low watermark. Should be a number between 0 to wh. It's default value is 85. When the utilization of the FS exceeds wl percent, the evictor daemon (next section) should asynchronously run an eviction operation on the filesystem.evict=num
: the eviction target. Should be a number between 0 to wl. It's default value is 70. Any eviction operation should leave the disk evict percent utilized.
Part IV. Eviction Policy (20 points): Implement an Ext2 specific eviction policy as a function
ext2_evict_fs(struct super_block *super)
in the filefs/ext2/ext2_evict.c
. This function should implement the Clock (second chance) algorithm on the supplied Ext2 FS. If the utilization of the filesystem is greater than the low watermark (wl
) mount option, it should scan inodes starting from the current clock hand in sequence, skipping directory inodes and storing the last time it scanned that inode until it identifies an inode whose access time is older than its last scan time. The evictor should then evict this file using theevict
call, and repeat the process until the disk usage drops to below the evict thresholdevict
mount option (see Options in Part III above). Store the last scan time for the file as an xattr namedscantime
. If the xattr is missing, create it with an initial value of 0. The position of your clock hand must be persistent across reboots, i.e., you must store the inode number the hand is pointing to on disk as an extended attribute namedclockhand
associated with the root directory (default value of inode 0 if the xattr is missing).Add the following VFS superblock operation to
super_operations in linux/fs.h
.int evict_fs(struct super_block *super);
. Your modified Ext2 filesystem must implement this operation by invoking theext2_evict_fs
call. Theevict_fs
VFS call must be invoked from two places. Once from the Ext2 filesystem specific function that allocates a new disk block whenever the FS utilization after the allocation becomes higher than thewh
parameter. It must also be invoked from a new kernel thread you will create calledkfs_evictd
implemented in thefs/evictd.c
file. This evictor thread must be registered to start when the system starts using thefs_initcall(fn)
function inkernel/kthread.c
(seemigration_init
inkernel/sched.c
for examples). Once started, the evictor should run once every minute and scan each mounted filesystem, invoking theevict_fs
VFS call if the FS defines it.- Part V. Testing your cloud filesystem: For this last
assignment of the course, we ask you to test your filesystem
yourselves. During the last week before the submission deadline,
we will provide you with a grading workload in the form of two
scripts
client_test.sh
andhost_test.sh
that you have to run on the host and on the emulator on a freshly created image to test your filesystem implementation. In addition to your kernel diffs, userspace cloud server source code, and cover sheet, you must also submit the following:- The output of the grading program
client_test.sh
run in the emulator. - The kernel log after the grading program completes (output of
adb dmesg
). - The final disk image file
hw5.img
after running the test workload. - The output of the
ls -all
command executed in theclfs_store
directory on your cloud server. - The output of the cloud FS host grading program
host_test.sh
run on the cloud server host.
- The output of the grading program
Important Hints
We are going to implement the network client code to evict files to the cloud server directly in the kernel to avoid having this become a semester long project. You can find a tutorial on how to write socket code in the kernel here. But normally, you should think many times about why you might want to put networking code in the kernel, especially when talking to untrusted servers. Imagine what could happen if a hacker got a direct backdoor through your server by replacing the contents of a root owned evicted file with a malicious script - at the very least, we would want some encryption and authentication.
Check the return values of all functions utilizing system resources. Your code should handle errors properly. Many failed function calls should not be fatal to a program. Typically, a system call will return -1 in the case of an error (malloc will return NULL). If a function call sets the errno variable (see the function's man page to find out if it does), you should use the result of strerror(errno) as part of your error message. As far as system calls are concerned, you will want to use one of the mechanisms described in Reporting Errors or Error Reporting.