CS1003: HW#6

CS1003 Homework #6
Due by Wednesday, May 5, at 5:00pm

There are two parts to this homework assignment: a theory portion, worth 10 points, and a programming portion, worth 15 points. Please be sure to review the submission instructions in advance, and make sure to include a README.

Written questions

(2 points) Explain, in a few sentences and in the context of the operating system's memory manager and the memory hierarchy in a computer, why having low amounts of RAM in your computer leads to a degradation in performance.
(6 points) The "distance" between an Internet client and server is often a important metric in determining how responsive a particular host will be. Often, that distance is measured in "hops" -- and the traceroute program (which you can run from the Windows NT/2000/XP command line as tracert, or /usr/sbin/traceroute on CUNIX) helps you to measure this. traceroute takes one important command-line parameter: the name (or IP) of the remote host to trace a route to. (You can also type in man traceroute on CUNIX for a detailed description of the tool.)
1. (2 points) Try to find a computer "as near as possible" to yours without actually being your computer. On CUNIX, there are a number of machines that are exactly 1 hop away. Show, using your traceroute results, that it is as near as possible, and suggest why that machine is "so near".
2. (2 points) Conversely, try to find a machine "as far away as possible" to yours, and include your traceroute results to demonstrate this. Note that, at times, firewalls will make it difficult to obtain a complete path: if the nth hop only shows stars without actually finishing the traceroute, you'll have to try another host that's not firewalled. Can you find something that is at least 15 hops away? What "part" of the Internet, topologically, might be "far away" from an institution like Columbia?
3. (2 points) Based on the characterizations of the two subproblems above, what can you conclude about Internet topology?
(2 points) Brookshear Chapter Review Problem 10.16 (page 448).

Programming assignment: text formatter

In this assignment, you're going to build a program that reads a text file via file I/O functions, stores each word in a unique array cell, and then prints out the text formatted nicely for a screen. The key is that the program will be able to handle files of a non-predetermined size, so we will have to use malloc to dynamically size the array.

I define "nice formatting" as a form of plain text word-wrapping. In other words, if you have something like:

the quick brown fox jumped over the lazy dog!

... nicely formatted would suggest:

the quick brown fox jumped over the lazy dog!

Obviously, we can't always fit all the text on one line. So, we'll use the old typewriter convention for word-wrapping: 80 columns, which is standard for a letter piece of paper if you're using a fixed-width font.

(4 points) Write a function called countWords that takes a FILE pointer as an argument, and returns an int -- the number of words found in this file. A UNIX tool called wc -- short for word count -- provides similar functionality, although it goes a bit further and tells you the number of lines, words and characters encountered; we're only concerned with the number of words. (Nevertheless, you may choose to use wc as a sanity check to make sure your countWords function is working.)

There are several strategies to do this. The easiest way is to process the file, one character (char) at a time, and see if it's whitespace or not. If it's whitespace, it's generally separating two words. If it's not, it's part of a word. In theory, all we'd have to do is to count every character that happens to be whitespace, and we'd be done. Unfortunately, you may have multiple spaces between words, and counting the exact number of whitespace characters will give you a skewed answer. So here's the strategy:
- Read a character using fgetc. If it's equal to EOF (end-of-file), you're finished -- return the # of words encountered as you've counted up in the next step. Otherwise...
- If it's whitespace, increment a local counter variable that serves as the number of words by one. Then, continue reading whitespace until you either reach EOF or a non-whitespace character, at which point you're done with this step. (If you encounter EOF, return just like you did in the previous step.)
- Go back to the first step.
The strategy here is that if there are multiple spaces between words, they get "slurped up" when they're encountered, without repeatedly increasing the word count. "Slurping" works because that's how FILE streams are structured -- they repeatedly feed you characters as you grab them.

Once you've finished counting the number of words, simply return the result.
(3 points) Write a function called readWord that takes a FILE pointer as an argument, and returns a pointer to char (i.e., a string) containing a single word. The strategy for this is similar to countWords, except that here you'll need a temporary string (100 characters should be sufficient) that you'll copy non-whitespace characters into. Since this function only needs to read a single word, you can follow a simplified process:
- Read a character. If it's whitespace, keep on reading until you reach the first non-whitespace character.
- As long as you haven't reached EOF, start copying the word, one character at a time, into the temporary array.
- As soon as you reach whitespace or EOF, stop. Put a \0 as the last character in your temporary array (to terminate it properly), and then return a malloc'ed duplicate of this word. Making a duplicate is necessary because the temporary array is a local variable which will disappear as soon as you leave this function. You can either malloc a string with the length of the word in the temporary array (i.e., strlen) and then use strcpy or you can use the function specially designed for this: strdup. (The latter is much easier to use!)
(3 points) Write a function called loadWords that takes three parameters: a FILE pointer, a pointer to a string array (i.e., pointer to a pointer to char array, or a "double pointer"), and an int suggesting the number of words. This function is very simple: it just calls readWord repeatedly, storing the results in the string array, one word at a time, and repeats this for the number of words provided as a parameter. No return value is needed. (Note that when I say "storing the results", I don't mean a strcpy operation; rather, we've already allocated new memory for this string, and all that you need to do is to make a literal pointer assignment to the appropriate array cell in the string array.)
(3 points) Write a function called printWords that takes two parameters: a pointer to a string array and the number of words in that string array, and prints them out properly formatted to standard out. It'll walk through the words array, one word at a time, and do some simple string length calculations to see if it'll fit on the current line being printed out (in other words, you'll need to keep a line length counter tracking where "your character is" on the line, and check to see if the next word would still fit within the 80-column limit). If there's enough space, it should just print it out without a newline at the end, and update the line length counter by using strlen on the string being printed out. Otherwise, it should print it out immediately after a newline, and should reset the counter (i.e., no hyphenating needed!). Make sure to include a space between each word.
(2 points) Write the main function. The main function will take one command-line argument -- the name of the file -- and will attempt to open it. (If the user neglects to supply a command-line filename, or if it's unopenable, the program should print an error and exit.) Once the file is open, the main function should compute the number of words and print it out on the screen. It should then malloc an array of strings (e.g., a double char-pointer) so that it has enough memory space for n char pointers (e.g., single char-pointers) where n is the number of words. Finally, it should call functions to load the words into this array and print out the nicely-formatted result.

Here are some hints you may find helpful.

I strongly suggest you test each part separately. For example, once you've written the countWords function, write a simple main that does an fopen immediately followed by countWords, and compare the results to an execution of the UNIX wc command on a simple text file.
To determine if a character is whitespace, use the isspace() function as declared in ctype.h (which you'll have to include). It takes a single char as input and returns a 0 if it's not, and a nonzero if it is.
There is a function called rewind in C that takes one parameter -- a FILE pointer -- and resets it back to the beginning of the file. You may find this useful as a way to return to the beginning of the file after the words have been counted so that they can then be rescanned for loading purposes.

(5 points extra credit) Modify the logic of readWord and printWords so that the formatter keeps paragraphs separate (that is, it rejustifies each paragraph, but prints them out separately). In order for this to work, you'll have to encode a newline inside individual word strings, and then printWords must check to see if such a newline occurs as it's printing data out. (Hint: if, while reading, you encounter two newlines in a row, you've found a paragraph break; store one in the word you're reading.)