Email Notification System (ENS)

Source code distribution

Aleksandr Voskoboynik
Columbia University
New York, NY 10027
USA
av69@columbia.edu

Yuliya Averbukh
Columbia University
New York, NY 10027
USA
ya39@columbia.edu

Abstract

Email Notification System (ENS) is a convenient way for a Web user to keep up with the changes in the Web sites of his interest without having to check them regularly. It is also a good way for a Web site owner to make sure that his updates don't go unnoticed. Any Web site owner interested in participating in the service can put a link to the CGI script in his page. Having followed this link, a user subscribes for the service by completing a simple form with his name and email address. Any one user can subscribe to unlimited number of Web sites. Having done so, he will be notified regularly (daily or weekly) about the changes in all of the sites of his interest. To fully or partially unsubscribe from the service, a user can click on the link located in each of the notification reports he receives.

Introduction

Our system is easy to install and use; it is scaleable, efficient and reliable. Below we outline the steps needed for both Web site owners and users in order to use ENS.

A Web site owner who is interested in publicizing the changes made to his directory puts a link on his HTML page to the CGI script called subscribe.cgi. The script takes the URL as an argument. Note that all the directories located under the specified one will be checked for changes as well.

A user clicks on the link located on the site of his interest. He is then asked to fill out a form which contains the following fields: URL, user email address and user name. The form will not be processed unless all of the fields are completed and valid. URL field is automatically filled by CGI script with owner-specified URL. However, this field can be changed by the user. A user also chooses the frequency with which he would like to be getting the modification reports for this site. Currently he has a choice of being informed on daily or weekly basis. It is possible for the same user to choose "weekly" option for some Web sites and "daily" option for the others.

The notification system is initiated by "daily" cron job for daily updates and by "weekly" cron job for the weekly ones. If at least one of the sites of user interest has been modified, the notification message will be sent out. The file modification will be announced if the fraction change in size is >= than the one defined by the page owner.

The system sends a nicely formatted email notification message to the user which contains individual reports on modified sites which are of interest to the subscriber. Report for a site contains links to the newly created or modified pages, with the HTML page title used to identify the page. The message footer contains the link to unsubscribe from the service.

"Daily" subscribers get update reports every morning if at least one of the Web sites of their interest is changed. "Weekly" subscribers get notified on Monday mornings. The reports conform to the following format for each of the changed sites.

<main URL>

<title> - <URL>

<action/modification time>

..................

<title> - <URL>

<action/modification time>

where action tells the user whether the file was created or modified.

In addition, there is an unsubscribe link at the end of each report. Since majority of e-mail viewers recognize URLs and make them clickable, this allows the subscriber to visit the updated Web pages by clicking on the URLs in the email message. This will also allow the user to unsubscribe from the service by following the link at the bottom of the message.

After a user clicks on an unsubscribe link located at the end of his Update Report, a CGI script generates a form which lists all of the URLs for which this user is subscribed. There is a checkbox corresponding to each URL. Originally, all the boxes are checked. To unsubscribe from any site, a user should uncheck the box. If all the boxes are left checked, the user will remain subscribed for all of the URLs in the list. If none of the boxes remain checked, the user unsubscribes from the service all together.

Architecture

  • Databases and State Files

    ENS maintains two databases, to keep track of daily and weekly subscriptions, respectively. Daily information is stored in "daily" subdirectory, and weekly information is stored in "weekly" subdirectory. Each line of the database contains the following information:

    <user email>::<name>::<URL> :: … :: <URL>

    ENS also automatically creates state files for each of the URLs requested by users. Depending on the frequency requested, the files go to either "daily" or "weekly" subdirectory. For each URL, such file stores the information about the current state of the corresponding directory (and all of its subdirectories). This information includes size of each file in the directory and its modification time. To see whether a particular site has been modified, it is enough to compare the newly obtained state with the one stored in the corresponding file.

    Note: All of the following modules have been written in Perl.

  • Notification module (email)

    This module sends a formatted email notification message to the user which contains individual reports on modified sites of interest to the subscriber. Report for a site contains links to the newly created or modified pages, with the HTML page title used to identify the page. The message footer contains the link to unsubscribe from the service. The file modification will be announced if the fraction change in size is >= than the one defined by the page owner.

    The check for modifications is accomplished by matching the old state file, which contains the old state of the directory (and all of its subdirectories), against the new one. The state files are produced by "find" utility with -ls and -follow options. Once the site has been processed, the old state file is replaced by the new one. A number of users might be subscribed to the same URL, therefore the system is optimized to check whether the site was already processed for another user before it proceeds to match the state files.

    The title of the page is the string enclosed in the tags <TITLE> and </TITLE> (tags can have lower and upper case characters). If the page doesn't have a title, the default "Untitled" is used.

    X-Email_Notification_To header is added to the notification message for the purpose of filtering of bounced email.

  • Subscription Module

    This module adds the necessary information about a particular user to one of the two databases we maintain. Two CGI scripts, "subscribe.cgi" and "add.cgi", work together to accomplish this task.

    This script uses GET method to obtain the URL for which user wants to subscribe. This allows the Web site owner to specify the URL to which the user should subscribe. It then displays a form to be completed by the user, and automatically puts the argument it receives in the URL field. However, a user can change it to any other URL. User email and name are required. A user also has a choice of notification frequency (daily or weekly). JavaScript functions from "FormChek.js" library are used to check the validity of fields. The form gets submitted only when all its fields are completed and valid. Submitted form is fed to another CGI script, "add.cgi".

    This script performs the actual addition of the user-specified information to the database. It uses POST method to obtain the information from the form and processes this information in the following way: depending on user-requested frequency, it adds the user to either daily or weekly database. If the user is already in a database, only the new URLs will be added to the existing record. For each URL added, the script stores the information about the current state of the corresponding directory (and all of its subdirectories), such as file names, their sizes and modification times.

  • Unsubscribe Module

    This module contains scripts that deal with the process of unsubscribing the user from the service. A user can unsubscribe partially (if he still wishes to be notified about some sites, but not the others), or fully. If the user is no longer subscribed for any sites, his name is altogether deleted from the database. This is accomplished by the two CGI scripts, "unsubscribe.cgi" and "delete.cgi".

    This script is called as the user clicks on a link at the bottom of the notification message. It receives a user email as its argument (using GET method). It then searches both "daily" and "weekly" databases and displays the form that lists all the URLs for which this user is subscribed. There is a checkbox corresponding to each URL, and it is checked by default. In order to unsubscribe from any of the URLs, a user has to uncheck the box(s). When the user submits the form, "delete.cgi" script is called.

    This script receives the form (using POST method), and removes from the database all the URLs in which the user is no longer interested. If there are no more URLs in which the user is interested, the entry for this user is deleted from the database. This script is also responsible for checking, for each URL deleted from the database, whether any other user is interested in this site. If not, the file which contains information about this URL is also deleted, as part of the cleanup routine.

  • Bounced email module (bounce)

    This module takes care of bounced email messages. It works in collaboration with the procmail filter, i.e. it is called by procmail on bounced messages which have been originally produced by the notification system. Procmail identifies those messages by looking for a special "X-Email_Notification_To" header in the body of the original message. The module reads the bounced message from STDIN (a feed from procmail) and looks for the X-Email_Notification_To: special header which identifies the undeliverable address. It then completely removes the records for the undeliverable address from both "daily" and "weekly" databases.

  • Common module (common.pl)

    This module contains a number of routines which are shared by various modules of the system. The module includes routines which:

  • Synchronization (common.pl)

    Synchronization routines facilitate synchronization with other simultaneous transactions, in order to ensure the stable state of database and state files, and a serializable schedule. If a process tries to obtain a lock while another process already holds it, the process will block until the lock is released by the other process. Once the process exits the critical section, it releases the lock in order to give another process a chance to execute.

    Program Documentation

    Task List

    Aleksandr Voskoboynik:
    Implemented Notification, Bounced email, and Common modules, as well as synchronization. Developed database layout. Created the installation script.

    Yuliya Averbukh:
    Implemented Subscribe and Unsubscribe modules

    Copied from DevEdge Online - JavaScript:
    FormChek.js library

    References

    1. Larry Wall and Randal Schwartz, Programming Perl, O'Reilly & Associates, Inc., Sebastopol, CA 95472
    2. DevEdge Online - JavaScript http://developer.netscape.com/tech/javascript/index.html