THE JAM PROJECT


JAM Configuration File

JAM Version 2.11


Information listed in the Configuration File

Each JAM site requires a configuration file called "DataSite.config" to be present in the directory in which the program is invoked. The configuration file contains information necessary for the correct operation of the JAM site, such as:

Properties of a Configuration File

Step by Step Instructions for Creating a Configuration File

  1. Create a file called "DataSite.config"
    Easy enough, right? Just make sure that this file exists in the directory containing the data files and that this directory is where you will invoke JAM.

  2. Set the NICKNAME variable
    Each JAM site is identifiable by a unique nickname. Most examples found in the documentation use names of fruit when referring to JAM sites, e.g., "blueberry", "marmalade", "mango".
    # A unique nickname for this data site
    ##############################################
    NICKNAME=Mango
    

  3. Set the CONFIGFILE_HOST variable
    This is the hostname (or IP address) of the machine running the Configuration File Manager (the process which the JAM site contacts to learn the identities and addresses of its peers)
    # The host for Configuration File Manager
    ##############################################
    CONFIGFILE_HOST=dynamo.cs.columbia.edu
    
    # The port for Configuration File Manager
    ##############################################
    CONFIGFILE_PORT=8175
    

  4. Set the DATANAME variable
    All local files will begin with the same DATANAME stem and, depending upon the file's contents, contain varying filename extensions. For example, the file containing building data for the second fold will be something like sj.2.bld.
    # name of the dataset
    DATANAME=sj
    

  5. Set all the filename extension variables

  6. Set the CLASSNAME_LEARNER variable
    The CLASSNAME_LEARNER variable tells JAM which learning algorithm to use for its base learner. The value of this variable must be a full classname (i.e., including all packages). In addition, you must ensure that your CLASSPATH environment variable includes the path to the indicated class.
    # The name of the class which is this datasite's local learning algorithm
    ##############################################
    CLASSNAME_LEARNER=jam.algs.id3.ID3Learner
    
  7. Set the CLASSNAME_METALEARNER variable
    The CLASSNAME_LEARNER variable tells JAM which learning algorithm to use during meta-learning. The value of this variable must be a full classname (i.e., including all packages). In addition, you must ensure that your CLASSPATH environment variable includes the path to the indicated class.
    # The name of the class which is this datasite's local meta learning algorithm
    ##############################################
    CLASSNAME_METALEARNER=jam.algs.bayes.BayesLearner
    
  8. Set the CVFOLD variable
    #default value for cross-validation
    CVFOLD=2
    
  9. Set the MLFOLD variable
    # default value for number of metalearning folds
    MLFOLD=2
    
  10. Set the MLLEVEL variable
    # default value for meta-learning level
    MLLEVEL=1
    
  11. Set the TRAIN_SPLIT_PERC variable
    # default value for splitting the original data set into training and
    test files. It makes sense only when CVFOLD is set to 1
    TRAIN_SPLIT_PERC=1
    
  12. Set the LOCAL_SPLIT_PERC variable
    # default value for splitting the traingin data set into training (for 
    training the local base learner) and validation files (for generating the
    meta level data, to be used by the meta learning algorithm). It makes 
    sense only when MLFOLD is set to 1
    LOCAL_SPLIT_PERC=1
    

  13. Set the IMAGE_DIR_URL and NUM_DATA_IMAGES variables
    The graphical JAM interface uses images when representing local and remote agents. The IMAGE_DIR_URL is a base url for the directory containing all the expected images. JAM will complain, but will not exit, if the directory does not contain one of the images.

    Within the image directory, JAM expects to see a directory named data/ containing a number of image files which represent various attributes for the local dataset. Set the value of the NUM_DATA_IMAGES variable to be the number of image files within the data/ directory.

    # The directory containing images for
    # standard parts of the animation.
    #
    #  Note: Expected gifs in this directory include:
    #    data.gif     ...representation of data
    #    local.gif    ...local representation of this JAM site
    #    engine.gif   ...representation of learning engine
    #    default.gif  
    #    metac.gif	  ...meta-classifier image
    ##############################################
    IMAGE_DIR_URL=http://www.cs.columbia.edu/~andreas/JAMimages/mango/
    NUM_DATA_IMAGES=6
    

  14. [OPTIONAL] Set the DOT_SERVER_HOST and DOT_SERVER_PORT variables
    The classifier diagramming tool can either construct graph visuals of classifiers locally or send the task to an outside process. The graph constructor executable is not ported to all platforms, therefore, for all unsupported platforms must utilize a DotServer when creating the classifier graph visual. JAM expects to find the graph constructor executable in the user's home directory under JAM_bin/, where architecture denotes the architecture of the local machine

    The DOT_SERVER_HOST and DOT_SERVER_PORT variables specify the location of the DotServer process. Note: including DOT_SERVER_HOST and DOT_SERVER_PORT variables in the configuration file does not automatically cause JAM to contact the DotServer for graph construction -- JAM must be started with an option in order for the construction to be handled by the DotServer, the default is to have graph construction performed locally.

  15. [OPTIONAL] Set the POST_PROCESSING_SCRIPT variable
    The current version of JAM allows the user to use his/her own program/script to process the results generated by the base and meta-classifiers. By defining the name of the ``result processing program'' the user directs JAM will invoke it after the end of batcj predicting a test file. (Batch prediction creates a .results file with predictions and the post processing script will output the results in a .conclusion file
    
    
    
    

Columbia University, September 1997. Last Modified: June 5, 1998
andreas@cs.columbia.edu