Principles and Practice of Parallel Programming
COMS 4130, Fall 2013

(Administrative) (Syllabus) (Mini-Projects) (Infrastructure) (Other Resources) (Piazza Q&A)

Course Administration


Lectures will be held on MW 2:40-3:55 pm in 233 MUDD.

Course Staff

Office Hours

Consult calendar (All OH in either CSB 468 or 469).

Course Overview

Learning how to program parallel computers (multi-core, clusters) productively and efficiently is a critical skill in this era of concurrency. The course will provide an introduction to modern parallel systems and their performance characteristics. It will cover the fundamentals of data-structure design, analysis and implementation for efficient parallel execution; programming abstractions for concurrency; and techniques for reasoning about the behavior and performance of parallel programs. Particular topics to be covered include: data parallelism, fine-grained concurrency, locality, load-balancing, overlapping computation with communication, reasoning about deadlock-freedom, determinacy, safe parallelization, implementing frameworks for concurrency (such as Hadoop Map/Reduce), debugging for correctness and performance. Students will study many parallel programs drawn from a variety of application domains (including high-performance computing, large-scale graph analyses, machine learning, game playing) Students will be expected to complete a series of parallel programming projects with good performance on a cluster of multi-cores, using a modern parallel language, X10.


Experience in Java, basic understanding of analysis of algorithms. COMS W1004 and COMS W3137 (or equivalent).


There is no required textbook, though several optional recommendations will be provided. The only requirement for this class is a Columbia CS account, which can be set up here.


Students are required to attend all classes, and attendance will be taken. If you need to miss a class, you must email Martha Kim at least 48 hours in advance of lecture.

Academic Honesty

We take academic honesty extremely seriously, and expect the same of you. The mini-projects are governed by the collaboration policy described below, with no collaboration allowed on the in-class quizzes. Outside of these two policies, the the Computer Science Department's policies on academic honesty are in effect, and any violations will be reported to the Dean's office.

Grading Formula

Mini-Projects: 80%
Quizzes: 10%
Participation: 10%
Individual grades will be posted to courseworks gradebook.


Date Unit Topic(s) Instructor Handouts Quiz Mini-Project
Sep 4 Background Overview; Introduction MK
Sep 9 Determinate X10 Intro; APGAS VS A Brief Introduction To X10 X10 2.4 tutorial
Sep 11 Determinate finish and async VS
Sep 16 Background Multicore; Hardware platforms; CUCS infrastructure MK Hardware Background Notes
Sep 18 Background Scaling theory; Performance modeling and measurement MK Scaling Theory Notes
Sep 23 Determinate Determinacy VS Definition of determinacy Definition of determinacy v0.2
Sep 25 Determinate Idioms: prefix sum, recursive parallelism VS Definition of determinacy v0.3 (Use this)
Sep 30 Determinate Clocks and Barriers VS
Oct 2 Determinate MK Mini-Project #1 Discussion
Queens with Pawns
Due: Monday 9/30 by 11:55pm
Oct 7 Indeterminate Need for synchronization;
Correctness + progress conditions;
Lock-free Queue
MK Indeterminate Computation Notes
Oct 9 Indeterminate Lock-free Queue (cont.);
Peterson's Lock
MK Indeterminate Computation Notes
Oct 14 Indeterminate Bakery Lock;
Flat Combining
MK Flat Combining and the Synchronization-Parallelism Tradeoff
Oct 16 Indeterminate Quiz MK Quiz #1
Oct 21 Blocking Blocking Synchronization MK Semaphore.x10
Oct 23 Indeterminate MK Mini-Project #2 Discussion
Parallel HashMap
Due: Monday 10/21 by 11:55pm
Oct 28 Blocking Buffer MK
Oct 30 Scale out TBA VS
Nov 4
Election Day
Nov 6 Scale out TBA VS
Nov 11 Scale out TBA VS Multi-place X10 Programs
Nov 13 Scale out TBA VS Mini-Proj #4 Proposals Due
Nov 18 Scale out MK Mini-Project #3 Discussion
Due: Sunday 11/17 by 11:55pm (NOTE: This is the night before the discussion class, so there will be no 24h grace period.)
Nov 20 Scale out VS Quiz #2
Nov 25 Scale out Streaming; MapReduce VS
Nov 27 MapReduce (cont); Pregel
Dec 2 Mini-Project #4 Presentations (See Piazza for deadline and logistics.)
Dec 4 Mini-Project #4 Presentations (See Piazza for deadline and logistics.)
Dec 9 Mini-Project #4 Presentations (See Piazza for deadline and logistics.)


Throught the semester you will complete four mini-projects. For each one you will work in pairs to implement a performant parallel computation. You will be expected to demonstrate good parallel speedups as well as a rationale for your design decisions, and an analysis of your program's performance. Three of the projects will be pre-set by course staff, with the fourth designated "students choice".

Discussion Classes

At the completion of each project, we will have a discussion class, where approximately five randomly chosen groups will be called to give "chalk talks" providing an overview of their design, a description of what brought them to that design, an analysis of what aspects were/were not successful, and a description of their speedups.


Projects will be structured, with course staff providing a test harness, Makefile, and, if appropriate, a reference serial implementation. All submissions are due, via courseworks, by 11:55pm two nights prior to the discussion class. You have the option of submitting up until 11:55pm the night before the discussion class for a 20% deduction in your score. After that point, we will no longer accept submissions.

Collaboration Policy

Groups are free to exchange ideas and approaches to the challenge problem freely. However, each group must implement and understand its own design, and be ready to present it during the discussion class.

Forming Pairs

You may work with the same or a different partner for each project. You may declare your partnership or request that a partner be assigned using this declaration form.


Sample X10 Programs

The programs discussed in class are available from SVN in the X10 repository on SourceForge. See here. Use an SVN client to check the code out, e.g. thus:
      svn co svn:// x10-code

Running X10 Programs

For this course you have the following three options for running X10 programs.
  1. On your own installation: Use the X10 2.4 release available here. (The old link download here should not be used.)
  2. Columbia CS's shared CLIC Lab
    Using your CUCS account, you may log in to This cluster has 44 nodes. Each of which is:
    • Dell Precision T5500 Workstation (Dual Quad Core Processor X5550 @ 2.66GHz + 8M cache)
    • 24GB DDR3 ECC SDRAM Memory, 1333MHz, 6 x 4GB
    • 1TB SATA 3.0Gb/s, 7200 RPM HardDrive
    Note: This is a shared cluster across the department. While it is quite large, machines will be running other loads. It is therefore best for development and rough timing measurements.
    Also note: If you are at home and encounter problems ssh'ing into clic-lab, it is likely a TCP/UDP mismatch between CLIC and (usually) Time Warner. There are two workarounds:
    1. Use Google's DNS servers: Instructions
    2. Connect directly to one of the clic-lab nodes (e.g., {london,moscow,bern,cairo}
  3. Private spicerack cluster
    Unlike clic-lab, spicerack is a dedicated mini-cluster for this course. It consists of: To run a job on spicerack, you must use the job queuing utility Condor. More info to come on this point, but this will ensure your job runs in isolation and thus gets clean timing measurements.

Other Resources