COMS W4111.001-Introduction to Databases
Fall 2023

Project 1, Part 1

(worth 25% of overall Project 1 grade)

FAQs

Summary of Deadlines

Teams

You will carry out this project in teams of two. If you can't find a teammate, please follow these steps:

  1. Post a message on the Ed Discussion board asking for a teammate - the best way.
  2. Send an email to Vicky right away (and definitely before Tue Sep 12 at 5 p.m. ET) asking Vicky to pair you up with another student without a teammate.

You do not need to notify us of your team composition. Instead, you and your teammate will indicate your team composition when you submit your project on Gradescope (click on "Add Group Member" after one of you has submitted your project). You will upload your final electronic submission on Gradescope exactly once per team, rather than once per student.

Important notes:

Overview of Project 1

In Project 1, you will build a substantial real-world database application of your choice. This project is split into three parts:

Overall, Parts 2 and 3 will be based on your description and design of Part 1.

Recall that we have a project contest, and that the best projects (as decided by the class staff) will get a 10% boost in the Project 1 grade.

Overview of Part 1 of Project 1

Part 1 of this project consists of multiple steps. At a high level, you will:

Pick an application that you will enjoy, since you will be working on it for a substantial part of the semester! A suggestion is that you build a database about something that you are interested in --a hobby, a favorite website, material from another course, a research project, etc. It's especially nice if you pick an application where you can populate your database using real, as opposed to fabricated, data. If you have an application where you can get a large amount of real data to populate your database, all the better, but it's not necessary.

If you're having trouble thinking of an application, take a look at almost any popular website. These sites tend to have similar themes and many could be reduced to an appropriate scope for the project. For example, social networks (e.g., Instagram, Reddit, Twitter), shopping sites (e.g., Etsy, Amazon), or content sites (e.g., The New York Times) can all be appropriate models. In particular, and as an example, shopping sites all have a similar theme: products, customers, orders, shopping baskets, reviews, etc., and typically could be reduced to make for an interesting and appropriately sized application. Try to make your application interesting, including a variety of different kinds of attribute domains (e.g., strings, integers, etc.) and relationships (i.e., with different key and participation constraints).

What you need to do for Part 1

  1. Find a teammate.
  2. Decide on an application for your project and write a document that you should email the member of the class staff (i.e., an IA or the instructor) with whom you will be meeting during the week of Mon Sep 18 through Fri Sep 22. You should email the document before the meeting and the document should include the following items:
    • A relatively informal, one-paragraph description of the application, not to exceed 20 lines or so, highlighting interesting and challenging parts. You can ignore all "security"-related issues (e.g., user authentication, encryption) in your application. The more concrete your written description, the more efficient and useful the meeting with the class staff will be. This paragraph should include:
      1. A high-level description of the general domain of the application.
      2. An idea of what entities and relationship sets you will have, including attributes and constraints. You don't need to have your design completely finalized, though, but of course it will help if you bring to the meeting at least a preliminary entity/relationship diagram, so we encourage you to do so. You should pick an application with a schema that is relatively substantial, but not too large. As general guidelines, your E/R design should be expected to have around 5 to 10 entity sets and around 5 to 10 relationship sets. You will get a sense if your design is too simple or too complex. Please talk with an IA during office hours if you are in doubt about this.
      3. A specification of whether you will follow the Web Front-End Option or the Expanded-Design Option for Part 3.
    • A brief "data plan," providing just two or three high-level sentences explaining what data you will use to populate your database later on.
    • (Web Front-End Option) A description of your user interaction plans: If you will follow the Web Front-End Option for Part 3, you should provide specific details as to how users will interact with your application. For example, if your application is somehow inspired by the Internet Movie Database, your description should describe the general "entities" that are involved, plus explain that your application might ask users for a movie title and return as a result the actors in the movie; you might also let users store in the database the fact that they liked certain movies and disliked others; finally, given a userid, your application might give recommendations on the movies that the user might like, given the user's previously recorded preferences, according to some simple "recommendation" algorithm.
    • (Expanded-Design Option) A description of your design expansion plans: If you will follow the Expanded-Design Option for Part 3, you should write a relatively informal, one-paragraph description of how you will expand your design in Part 3, not to exceed 20 lines or so. The expansion of your project in Part 3 should augment your project --in terms of the number of entity sets, relationship sets, and overall "complexity" of the design-- roughly by 50%. This expansion should be substantial: rather than just adding a few entity sets and relationship sets that are overly similar to those in Part 1 of the project, you are expected to add a truly novel and significant component to your database (following the above "50% increase in complexity" guidelines). For example, if your Part 1 database follows some variant of the Amazon shopping site, a substantial expansion for Part 3 could be the addition of a sophisticated "subsystem" for product reviews and ratings, as well as for allowing users to vote on the usefulness of the reviews from other customers, etc.
    • A short description of your contingency plan. Since students do occasionally drop classes, and to prevent last-minute surprises, we suggest that you include in the document for the meeting a "contingency plan" for the unfortunate case in which one of the teammates drops the class later in the semester. This contingency plan should indicate how you will "downgrade" the project to a simpler one in such a case --including in Part 3, whether you follow the Web Front-End Option or the Expanded-Design Option--, so that it is appropriate for a single person to complete. As general guidelines, your E/R design for a one-person project should have around 3 to 7 entity sets and around 3 to 7 relationship sets. If your teammate drops the class, rather than finding a new teammate to complete the project, which is problematic for a number of reasons, you will complete the "downgraded" version of your original project. Including such a contingency plan is optional, but if you choose not to have it in written form in the document for your meeting, for approval, and your teammate drops the class later, you will have to complete the original project as planned, and no exceptions will be made at that point.
  3. Meet with an IA or the instructor during the week of Mon Sep 18 through Fri Sep 22 to discuss your design and plans, and make sure that they are appropriate (i.e., challenging enough, but not unrealistically so). This meeting is required and should last up to 10 minutes. Your grade for Part 1 will be decreased substantially if you don't meet with any of the class staff. We will have expanded office hours during that week and you must sign up for a meeting slot ahead of time in this Google Calendar, for which you should login with your @columbia.edu or @barnard.edu account. Both team members must attend the meeting, at the same time. Before this meeting, email the document described in the previous point (i.e., in item (2)) to the member of the class staff that you are meeting with. It is OK for you to email the document just a few minutes before the meeting, as long as the member of the class staff who you are meeting with has already received the document when you start the meeting. During the meeting, you should note the feedback that you receive, because you will have to incorporate this information for your final submission, described next.
  4. After an IA or the instructor has OKed your general application, your data plan, your plans for Part 3, and your contingency plan, submit the following three items on Gradescope just once per team (not once per teammate) by Fri Oct 6 at 5 p.m. ET:
    • A revised version of the document that you prepared and emailed for your meeting with the instructor or an IA (including the description of your application, your data plan, your plans for Part 3, and optionally your contingency plan); this revised version should incorporate all of the feedback that you received during the meeting.
    • The E/R diagram of your database, following the syntax that we saw in class. Use only "simple" attribute domains (e.g., string, integer, date, etc.) like the ones we covered in class (and not, say, arrays or other structured domains). You should specify as many of the real-world constraints for your application as possible. Write in plain English at the bottom of the diagram any constraints that cannot be captured with the E/R syntax that we use in our course.
      Note for Expanded-Design Option: in Part 1, you do not need to write or submit the E/R diagram for your Part 3 expanded design yet (you need to submit the complete E/R diagram for Part 1, but not for the Part 3 expansion, which you will submit later, with Part 3); instead, you should describe your expansion plans for Part 3 with a one-paragraph description of your expansion plans for Part 3, which you should have gotten approved during the meeting. You will do the E/R diagram, etc. for Part 3 later in the semester.
    • The SQL schema of your database, which you should derive by mapping your E/R diagram into SQL using the method that we will cover in class. Use only "simple" attribute domains like the ones we covered in class (and not, say, arrays or other structured domains). You should capture as many of the E/R constraints (e.g., key and participation constraints) as possible. Briefly discuss in plain English any constraints that you cannot yet reflect in your SQL schema.

    Important submission instructions: Please do not upload .zip, .tar, etc. files, but rather you can submit multiple (uncompressed) files if necessary. (Grading .zip, .tar, etc. files is really inconvenient for the IAs.) You should submit your project exactly once per team, rather than once per student. For this, as discussed above, click on "Add Group Member" after one of you has submitted your project.

Note: If you observe Jewish or other religious holidays that overlap with this part of the project, please email the instructor to arrange for alternative deadlines.

Grading for Part 1

Your grade for Part 1 of Project 1 will be split as follows:

  1. Emailing document and meeting with class staff: 7 points. If you email the member of the class staff that you are meeting with your document, as specified above, ahead of the meeting, and both you and your teammate attend the required meeting, you can expect to get all points, even if you are asked to make changes or revisions to your proposal.
  2. Quality of revised document for your application: 6 points. We will evaluate the overall quality of your final description of your application, especially in terms of how thoroughly you incorporated any revisions suggested during your meeting with the class staff.
  3. Quality of E/R diagram: 6 points. We will evaluate how well your E/R diagram models your proposed application, including how well you modeled any relevant real-world constraints.
  4. Quality of SQL schema: 6 points. We will evaluate how well you mapped your E/R diagram, including constraints, into a SQL schema using the technique that we covered in class.

Frequently-Asked Questions