Project 3: Modules

There have been several recent efforts to provide online learning platforms at a very large scale across the web. Examples include the Khan Academy, some recent courses taught to hundreds of thousands of students by Stanford CS Professors, and offerings from startups like Udacity, edX, and Coursera. In a seminar at Columbia in March 2012, Daphne Koller (one of those Stanford Professors) suggested that there might be a natural unit of teaching (which we'll call a ``module'') that is larger than a lecture, but smaller than a course. Characteristics of a module are:

These properties are somewhat subjective, and you'll need to use some judgment in determining the appropriate scope for a module.

The goal of this project is to automatically identify modules from a large number of course descriptions on the web. We'll narrow down to a single discipline (e.g., ``college computer science,'' ``high school algebra,'' ``graduate law,'') and consider the entire syllabus from many comparable departments across many institutions. Your program will serve as a tool to help educators efficiently design curriculum components.

The output of your program will be a set of modules including dependencies between modules. (Module A depends on module B if B must be completed in order to be able to grasp the concepts in A. Presumably the dependency graph is acyclic.) One way to infer dependencies is to look at the course prerequisites, although such prerequisites are at a coarser granularity than we really want. What else might be relevant for inferring module dependencies?

Take a look at Columbia's CS bulletin course descriptions. Many of them enumerate topics that are covered during the course. Are these good candidates for modules? What if two items are tightly connected, such as ``memory management'' and ``virtual memory'' in Operating Systems? What might you learn from similarities and differences in grouping among different institutions?

At the end of the project, you'll be asked to run your program for several disciplines (we'll discuss which in class; at least one will be previously unseen) using a specified set of course web pages. Your output should be provided in the form of a pdf document that can be reviewed by an expert in the field. The instructor will try to identify willing experts in several disciplines to judge the output results.

Some issues to think about:

Ken Ross 2012-09-17