COMS W4111.001-Introduction to Databases
Fall 2023

Project 1, Part 3

(worth 50% of overall Project 1 grade)

Your team

You will do Part 3 of Project 1 with the same team as for Parts 1 and 2. If your team partner dropped the class and you did not submit a contingency plan for this with your Part 1 submission, then unfortunately you will still have to complete the whole project by yourself. If you team partner dropped the class and you did submit a contingency plan for this with your Part 1 submission, then you are welcome to switch to this reduced version of your project.

Overview of Part 3 of Project 1

As you recall from Part 1, you had two options for Part 3 of the project: you could either follow the Web Front-End Option or the Expanded-Design Option. You will follow for Part 3 the option that you stated in Part 1.

Important notes


Web Front-End Option

If you are following the Web Front-End option, you will finish building the application that you proposed in Part 1, on top of the database that you created in Part 2. For the final evaluation of Project 1, you will need to submit your code for your application and a README file on Gradescope by Monday November 20 at 5 p.m. ET (see below for further instructions). Also, both team members will meet with your project mentor on Tuesday November 21. Your project mentor will contact you shortly to schedule a 15-minute meeting for that day. Your implementation will be on Python 3 using Flask, and should satisfy these requirements:

The following resources may be helpful for learning both Python and Flask:

Getting started

Your job is to implement your proposed web application. To help you out, we have provided a bare-bones Flask web application, server.py, available here. It provides code that connects to a database URL, and a default index page. Take a look at the comments in server.py to see how to use or modify the server. In particular, note that you will need to modify the value of DATABASEURI inside server.py, to refer to your PostgreSQL username and password. You will need to connect to your database from Part 2. Please read all these directions. Once you get it running, you should start working on your custom logic.

Important: Please run python --version to figure out what version of Python is the default in your VM/virtual environment. If the version is 2.7.* or you receive a "Command 'python' not found" error, you will need to run Python as python3 to make sure you are using Python 3.

A short explanation of SQLAlchemy

We use a Python package called SQLAlchemy to simplify our work for connecting to the database. For example, server.py contains the following code to load useful functions from the package:

from sqlalchemy import *

SQLAlchemy is able to connect to many different types of DBMSs, including SQLite, PostgreSQL, MySQL, and Oracle. Each such DBMS is called an "engine." The create_engine() function sets up the configuration to the specific DBMS we want to connect to:

engine = create_engine(DATABASEURI)

Given an engine, we can then connect to it (this is similar to how psql connects to our class's PostgreSQL database server):

conn = engine.connect()

At this point, the conn connection object can be used to execute queries to the database, which must be "wrapped" with text() as below, followed by conn.commit() to make the queries actually run. This is basically what psql is doing under the covers:

cursor = conn.execute(text("select 1"))
conn.commit()

The execute function takes a SQL query string as input and returns a cursor object. You can think of this as an iterator over the result relation. This means you can run SELECT * on, say, a million-row table and not run out of memory. Instead of receiving the entire result at once, this object lets you treat the result as an iterator and call next() on it, or loop through it. See the SQLAlchemy documentation for a detailed description.

# this fetches the first row if called right after
# the execute function above. It also moves the
# iterator to the next result row.
record = cursor.fetchone()
# this will fetch the next record, or None if
# there are no more results.
second_record = cursor.fetchone()
# this loops through the results of the cursor one by one
for row in cursor:
  print (list(row))

The above description is a way to directly write and run SQL queries as strings, and directly manipulate the result relations. SQLAlchemy also includes an Object Relational Mapper that provides an interface that hides SQL query strings and result sets from you. In this project, you will directly write and run SQL queries, and cannot use any ORM functionality.

IMPORTANT NOTE: Your code should avoid the SQL injection vulnerability that we discussed in class. Specifically, your code should never build "raw" query strings for the database using user input, but rather you should use parameters, etc. as discussed in class.

(Optional) Using a toolkit for application front-end

You are welcome to use the Bootstrap framework to style the front-end of your application. Note that this is by no means necessary, and styling your application will not impact grading in any way; see "Grading for Web Front-End Option" below. You are also welcome to use AJAX to create more complex interactions with your site. However, if you do so, you must use JQuery and you must add a section to your README file describing where each call occurs, what triggers the call, what endpoint it hits, and the call's purpose. Furthermore, any JavaScript you write must be well commented, and cannot rely on frameworks or libraries not mentioned here. Particularly, you cannot use any tools that alter the way you interact with the database (e.g., GraphQL). If you have any questions regarding specific frameworks, please contact your project mentor.

Working with a version control system

Since you are working collaboratively with your teammate, we strongly encourage you to use a version control system for your code such as git on GitHub. You should use a private repository on GitHub for this project.

We cover below the basics of git and GitHub. With this setup, you can code on your own desktop, commit and push your changes to the GitHub repository, and then pull the updated changes on your Google Cloud Compute Engine. You can also code without your teammate being in the same room, and your teammate will be aware of the changes after "pulling" the code from GitHub. Conversely, your teammate will also be able to contribute to the same code repository, so you can both work collaboratively. Please follow these steps to get started with git and GitHub:

  1. Register an account on GitHub. You will need to provide/verify your email address and provide a username. There is no requirement on what email or username you use here. Your teammate should also register for a different account, so that you both have access to the code repository.
  2. Once your account is created, log into GitHub. Create a repository by clicking on Create a new repository.
  3. Give your repository a name, such as "w4111-proj1". From here on we will use <projectname> to denote the name that you chose in this step for your project. Make sure that you make this repository Private by choosing the correct radio button; also do not check the README box.
  4. Give your teammate access to the repository: If your repository is at URL <REPO_URL>, go to the webpage <REPO_URL>/settings/access by manually adding "/settings/access" to the URL (e.g., if your repository is at https://github.com/abc234/cs4111, then go to https://github.com/abc234/cs4111/settings/access). Click "Add people" and then enter your teammate's GitHub username or email. Then click "Add <EMAIL/USERNAME> to this repositoryā€¯. Your teammate will receive an invitation by email, which your teammate should accept. After this, you are done with the GitHub web interface.
  5. Now start your Google Cloud Compute Engine (recall our instructions for Part 2), ssh into it, and run the following commands (replacing all points that are surrounded by "< >" with your information (for example, "<your full name>" should be changed to "Luis Gravano" for the instructor):
    # download and extract our skeleton Flask web application
    cd ~
    wget http://www.cs.columbia.edu/~gravano/cs4111/Proj1-3/webserver.tar
    tar xf webserver.tar
    mv webserver <projectname>
    cd <projectname>
    chmod -R 777 .
    # configure your git environment
    git config --global user.name "<your full name>"
    git config --global user.email "<email you used to register for GitHub>"
    # initialize git repository, and push the skeleton files up to the server
    # here <username> denotes your GitHub username
    git init
    git remote add origin <REPO_URL>.git
    git add *
    git commit -m "initial commit"
    git push -u origin master # you will be prompted for your username and personal access token here
    
    If you don't have a personal access token, please follow the instructions that are available here to generate one. Make sure to write down your token, because it will no longer be viewable after you create it. Note that Github is no longer accepting account passwords when authenticating Git operations.

    You have now created a local git repository, made your first commit, and pushed the files contained in webserver.tar onto GitHub. Your usual workflow after these initial steps will be a bit different than this, as described in the next step.
  6. Your workflow with git will typically comprise the following steps:
    1. Pull your changes from the GitHub repository so that your code is up-to-date: git pull. Note that this step is necessary because, when working on a project with a teammate, your teammate may have updated and pushed the code while you were not working on the project, so you should always pull the changes before beginning to work on a new revision.
    2. Work on the project, make some changes locally, and test them.
    3. See what git thinks you have changed: git status
    4. Add any modified files to the next "commit." For example, if you modified files server.py and README, then run: git add server.py README
    5. Commit your changes (locally) with an appropriate commit message: git commit -m "added feature X"
    6. Push your changes to the GitHub server: git push
  7. Step 5 is more or less all you need to get started if you are working on this project alone. However, if you are working as a two-person team, you might even be using different computers to do your coding. (It's cumbersome to develop your code directly on Google Cloud, and it's much more convenient to use your desktop or laptop for writing the code and then "pull" the code into your Google Cloud account using git, as discussed in the "Running your application..." section below.) So if you carried out Step 5 above for your team, then your teammate will need to perform the following steps to get a copy of the code on their desktop or laptop. Similarly, you will have to perform the steps below for any new computer that you want to use to continue developing your code:
    # configure your git environment, as in Step 5
    git config --global user.name "<your full name>"
    git config --global user.email "<email you used to register for GitHub>"
    # clone (i.e., download) the repository onto your local machine
    # use the username of the owner of the repository here
    git clone <REPO_URL>.git
    # you're done setting up; you can now continue as in Step 5

git provides many other powerful functions that we haven't talked about. For more information, you can refer to this tutorial, and documentation on that website. If you encounter a problem with git, you can typically do a quick search with the error message and get a lot of helpful information online. Feel free also to come to office hours with any of your questions about git and GitHub.

Running your application on your Google Cloud compute engine

Once you have developed (a preliminary version of) your web application, you will deploy it to your Google Cloud Compute Engine, as follows

  1. (One time setup) Follow our directions to make a port accessible to the internet so anyone can access your application.
  2. Write down the IP of your virtual machine.
  3. ssh to your virtual machine and enter the virtual environment you created in Part 2 of the project.
  4. Make sure you have committed and pushed all the latest changes to your code to your GitHub repository. (See Step 6 above.)
  5. Copy the latest version of your code to the Google Cloud Compute Engine by running git pull on your virtual machine.
  6. Run the Python server with the defaults parameters, which will listen for requests on port 8111. Run with --help if you need help:
    # <projectname> is the name of your repository and directory that you created in Step 5 above
    cd <projectname>
    python3 server.py
  7. Go to http://<IP ADDRESS>:8111/ in your browser to check that it worked.

You will need this URL when presenting the project to your mentor. Please do not turn off your virtual machine after you are done modifying your code and when you are ready to submit, so that your IP address does not change and the URL that you include with your project submission works.

Keeping your application running for your meeting with your mentor: To keep your application running "in the background" (so that it is available when you meet with your mentor), you can use the screen command. To install screen, run on your VM: sudo apt-get install screen. Run screen in the terminal (and press space or return if prompted), then switch to the correct environment (recall our instructions for doing so), and finally execute your server application normally (i.e., by running python3 server.py). After your application finishes starting up, press CTRL + a, and then d. The application will be running in the background now and it is safe for you to log out of the ssh session. Run screen -r to bring back the detached screen (and your process) to the foreground, and to stop your application (after we are done grading Part 3). (For more information and details on screen, please refer to https://www.mattcutts.com/blog/a-quick-tutorial-on-screen/.)

What to submit and when for Web Front-End Option

If you are following the Web Front-End Option, you will need to submit your code for your application and a README file on Gradescope by Monday November 20 at 5 p.m. ET. Just as for Parts 1 and 2, you should submit your project exactly once per team, rather than once per student. (Click on "Add Group Member" after one of you has submitted your project.) Here are the rest of the instructions for your electronic submission:

In summary, you need to submit on Gradescope exactly two files: (1) your proj1-3.tar.gz file with your code and (2) your uncompressed README file. You need to submit these two files by Monday November 20 at 5 p.m. ET.

IMPORTANT NOTE: You can use grace late days as usual, but you need to make sure that you schedule your meeting with your project mentor so that you have submitted your files on Gradescope before the meeting. The grace late days will be computed with respect to the time of your submission on Gradescope.

Additionally, both teammates in each team will meet together with their project mentor on Tuesday November 21. Your project mentor will email you shortly to schedule a 15-minute meeting for that day. (If you haven't received an email from your project mentor by Monday November 6, please contact your mentor.) During the meeting with your project mentor, you will show your mentor your application using a regular web browser, by sharing your screen during the online meeting:

Grading for Web Front-End Option

Your grade for Part 3 of Project 1 will be a function of how well your application (which should be up and running) matches your specification that you submitted as Part 1, of how well you have incorporated any feedback that your project mentor has given you, and of how well you have followed the guidelines above. Part of your grade will be a function of how well you avoid the SQL injection vulnerability that we discussed in class (see also above). Your grade will not be influenced by how fancy the web-based user interface to your application is. It is sufficient and perfectly fine for this interface to be plain and simple as long as it supports the functionality that you indicated earlier, following the guidelines above about not having to type SQL commands, not "locking up" on unexpected input, etc.


Expanded-Design Option

If you are following the Expanded-Design Option, you need to follow the expansion plans that you outlined in Part 1, and:

  1. Extend your E/R diagram from Part 1 to include the entity sets and relationship sets—and all associated real-world constraints—for your expanded design.
  2. Extend your SQL schema from Part 2 of your database on our PostgreSQL server to include the mapping of all new entity sets and relationship sets, following the directions in Part 2 of the project on how to specify constraints in your expanded SQL design.
  3. Add tuples to your new tables on our PostgreSQL server, following the guidelines on the number of tuples from Part 2 of the project.

What to submit and when for Expanded-Design Option

You will submit this part of the project electronically on Gradescope directly, along the lines of what you did for Part 2. The deadline is Monday November 20 at 5 p.m. ET. You can use grace late days as usual for this part of the project. Just as for Parts 1 and 2, you should submit your project exactly once per team, rather than once per student. (Click on "Add Group Member" after one of you has submitted your project.) You should submit one or more (uncompressed) files containing:

  1. The name and UNI of both teammates.
  2. The PostgreSQL account name for your database on our server (i.e., specify which teammate's UNI we should use to identify the database for your team.) This will normally be the same database that you used for Part 2, but we need you to confirm that we should check that database.
  3. A textual description of your extensions on the database design, explaining which entity sets and relationship sets are new, and how you mapped them to SQL statements.
  4. Your new, complete E/R diagram, including all of your entity sets and relationship sets, both from Part 1 and the new ones from Part 3.
  5. The CREATE TABLE statements and any other elements of the full database that you created on your PostgreSQL database. (We will of course also check the schema directly on the database server, but we need as well the statements as part of your submission file.) You should include all of your tables, not just the new ones for Part 3.
  6. Three "interesting" SQL queries over your expanded database, with a sentence or two per query explaining what the query is supposed to compute. Each of these queries should involve at least one of the new tables that you added for Part 3. The goal of these queries is to help us better understand your application and your additions for Part 3. You will not be graded on these queries, but we strongly suggest that you submit well formed queries that run without problems, so please make sure that you have tested your queries by running them on your database exactly as submitted (use copy and paste).

Grading for Expanded-Design Option

Your grade for Part 3 of Project 1 will be a function of how well you have incorporated any feedback that your project mentor has given you, and the following factors:

  1. Quality of your expanded E/R diagram: We will evaluate how well your expanded E/R diagram implements your plans for the Expanded-Design Option from Part 1, and how well your expanded E/R diagram models your application, including how well you modeled any relevant real-world constraints.
  2. Quality of your expanded SQL schema and implementation on PostgreSQL: We will evaluate how well you mapped your expanded E/R diagram, including constraints, into a SQL schema on PostgreSQL, using the techniques that we covered in class.
  3. Quality of your expanded constraint handling: We will evaluate how well you managed to capture real-world constraints of your expanded design through primary key, foreign key, unique, and good-style attribute- and tuple-based CHECK constraints.
  4. Quality of the expanded real-world (or at least realistic) data that you loaded into the expanded database on PostgreSQL.