COMS 4130, Fall 2013

Course Administration

Lectures

Lectures will be held on MW 2:40-3:55 pm in 233 MUDD.

Course Staff

Instructors
Martha A. Kim (martha@cs.columbia.edu)
Vijay A. Saraswat (vijay@saraswat.org)
Teaching Assistants
Andrea Lottarini (lottarini@cs.columbia.edu)
Joaquin Ruales (jar2262+4130@columbia.edu)

Office Hours

Consult calendar (All OH in either CSB 468 or 469).

Course Overview

Learning how to program parallel computers (multi-core, clusters) productively and efficiently is a critical skill in this era of concurrency. The course will provide an introduction to modern parallel systems and their performance characteristics. It will cover the fundamentals of data-structure design, analysis and implementation for efficient parallel execution; programming abstractions for concurrency; and techniques for reasoning about the behavior and performance of parallel programs. Particular topics to be covered include: data parallelism, fine-grained concurrency, locality, load-balancing, overlapping computation with communication, reasoning about deadlock-freedom, determinacy, safe parallelization, implementing frameworks for concurrency (such as Hadoop Map/Reduce), debugging for correctness and performance. Students will study many parallel programs drawn from a variety of application domains (including high-performance computing, large-scale graph analyses, machine learning, game playing) Students will be expected to complete a series of parallel programming projects with good performance on a cluster of multi-cores, using a modern parallel language, X10.

Prerequisites

Experience in Java, basic understanding of analysis of algorithms. COMS W1004 and COMS W3137 (or equivalent).

Resources

There is no required textbook, though several optional recommendations will be provided. The only requirement for this class is a Columbia CS account, which can be set up here.

Attendance

Students are required to attend all classes, and attendance will be taken. If you need to miss a class, you must email Martha Kim at least 48 hours in advance of lecture.

Academic Honesty

We take academic honesty extremely seriously, and expect the same of you. The mini-projects are governed by the collaboration policy described below, with no collaboration allowed on the in-class quizzes. Outside of these two policies, the the Computer Science Department's policies on academic honesty are in effect, and any violations will be reported to the Dean's office.

Grading Formula

Mini-Projects: 80%
Quizzes: 10%
Participation: 10%
Individual grades will be posted to courseworks gradebook.

Syllabus

Date	Unit	Topic(s)	Instructor	Handouts	Quiz	Mini-Project
Sep 4	Background	Overview; Introduction	MK
Sep 9	Determinate	X10 Intro; APGAS	VS	A Brief Introduction To X10 X10 2.4 tutorial
Sep 11	Determinate	finish and async	VS
Sep 16	Background	Multicore; Hardware platforms; CUCS infrastructure	MK	Hardware Background Notes
Sep 18	Background	Scaling theory; Performance modeling and measurement	MK	Scaling Theory Notes
Sep 23	Determinate	Determinacy	VS	Definition of determinacy Definition of determinacy v0.2
Sep 25	Determinate	Idioms: prefix sum, recursive parallelism	VS	Definition of determinacy v0.3 (Use this)
Sep 30	Determinate	Clocks and Barriers	VS
Oct 2	Determinate		MK			Mini-Project #1 Discussion Queens with Pawns Due: Monday 9/30 by 11:55pm
Oct 7	Indeterminate	Need for synchronization; Correctness + progress conditions; Lock-free Queue	MK	Indeterminate Computation Notes LockFreeQueue.x10 QueueTest.x10
Oct 9	Indeterminate	Lock-free Queue (cont.); Peterson's Lock	MK	Indeterminate Computation Notes Peterson.x10 Bakery.x10 LockTest.x10
Oct 14	Indeterminate	Bakery Lock; Flat Combining	MK	Flat Combining and the Synchronization-Parallelism Tradeoff Hello.x10 Hello.submit nodename.submit
Oct 16	Indeterminate	Quiz	MK		Quiz #1
Oct 21	Blocking	Blocking Synchronization	MK	Semaphore.x10 WhenTest.x10 Latch.x10 LatchTest.x10 Pipeline.x10
Oct 23	Indeterminate		MK			Mini-Project #2 Discussion Parallel HashMap Due: Monday 10/21 by 11:55pm
Oct 28	Blocking	Buffer	MK
Oct 30	Scale out	TBA	VS
Nov 4	Election Day
Nov 6	Scale out	TBA	VS
Nov 11	Scale out	TBA	VS	Multi-place X10 Programs
Nov 13	Scale out	TBA	VS			Mini-Proj #4 Proposals Due
Nov 18	Scale out		MK			Mini-Project #3 Discussion PageRank (pageranks_tests.tar.gz) Due: Sunday 11/17 by 11:55pm (NOTE: This is the night before the discussion class, so there will be no 24h grace period.)
Nov 20	Scale out		VS		Quiz #2
Nov 25	Scale out	Streaming; MapReduce	VS
Nov 27		MapReduce (cont); Pregel
Dec 2						Mini-Project #4 Presentations (See Piazza for deadline and logistics.)
Dec 4						Mini-Project #4 Presentations (See Piazza for deadline and logistics.)
Dec 9						Mini-Project #4 Presentations (See Piazza for deadline and logistics.)

Mini-Projects

Throught the semester you will complete four mini-projects. For each one you will work in pairs to implement a performant parallel computation. You will be expected to demonstrate good parallel speedups as well as a rationale for your design decisions, and an analysis of your program's performance. Three of the projects will be pre-set by course staff, with the fourth designated "students choice".

Discussion Classes

At the completion of each project, we will have a discussion class, where approximately five randomly chosen groups will be called to give "chalk talks" providing an overview of their design, a description of what brought them to that design, an analysis of what aspects were/were not successful, and a description of their speedups.

Turnin

Projects will be structured, with course staff providing a test harness, Makefile, and, if appropriate, a reference serial implementation. All submissions are due, via courseworks, by 11:55pm two nights prior to the discussion class. You have the option of submitting up until 11:55pm the night before the discussion class for a 20% deduction in your score. After that point, we will no longer accept submissions.

Collaboration Policy

Groups are free to exchange ideas and approaches to the challenge problem freely. However, each group must implement and understand its own design, and be ready to present it during the discussion class.

Forming Pairs

You may work with the same or a different partner for each project. You may declare your partnership or request that a partner be assigned using this declaration form.

Infrastructure

Sample X10 Programs

The programs discussed in class are available from SVN in the X10 repository on SourceForge. See here. Use an SVN client to check the code out, e.g. thus:

      svn co svn://svn.code.sf.net/p/x10/code/courses/pppp2013 x10-code

Running X10 Programs

For this course you have the following three options for running X10 programs.

On your own installation: Use the X10 2.4 release available here. (The old link download here should not be used.)
Columbia CS's shared CLIC Lab
Using your CUCS account, you may log in to clic-lab.cs.columbia.edu. This cluster has 44 nodes. Each of which is:
- Dell Precision T5500 Workstation (Dual Quad Core Processor X5550 @ 2.66GHz + 8M cache)
- 24GB DDR3 ECC SDRAM Memory, 1333MHz, 6 x 4GB
- 1TB SATA 3.0Gb/s, 7200 RPM HardDrive
Note: This is a shared cluster across the department. While it is quite large, machines will be running other loads. It is therefore best for development and rough timing measurements.
Also note: If you are at home and encounter problems ssh'ing into clic-lab, it is likely a TCP/UDP mismatch between CLIC and (usually) Time Warner. There are two workarounds:
1. Use Google's DNS servers: Instructions
2. Connect directly to one of the clic-lab nodes (e.g., {london,moscow,bern,cairo}.clic.cs.columbia.edu)
Private spicerack cluster
Unlike clic-lab, spicerack is a dedicated mini-cluster for this course. It consists of:
To run a job on spicerack, you must use the job queuing utility Condor. More info to come on this point, but this will ensure your job runs in isolation and thus gets clean timing measurements.

Principles and Practice of Parallel Programming
COMS 4130, Fall 2013

Course Administration

Lectures

Course Staff

Office Hours

Course Overview

Prerequisites

Resources

Attendance

Academic Honesty

Grading Formula

Syllabus

Mini-Projects

Discussion Classes

Turnin

Collaboration Policy

Forming Pairs

Infrastructure

Sample X10 Programs

Running X10 Programs

Other Resources

Principles and Practice of Parallel Programming COMS 4130, Fall 2013

Course Administration

Lectures

Course Staff

Office Hours

Course Overview

Prerequisites

Resources

Attendance

Academic Honesty

Grading Formula

Syllabus

Mini-Projects

Discussion Classes

Turnin

Collaboration Policy

Forming Pairs

Infrastructure

Sample X10 Programs

Running X10 Programs

Other Resources

Principles and Practice of Parallel Programming
COMS 4130, Fall 2013