THE JAM PROJECT

JAM Configuration File

JAM Version 2.11

Information listed in the Configuration File

Each JAM site requires a configuration file called "DataSite.config" to be present in the directory in which the program is invoked. The configuration file contains information necessary for the correct operation of the JAM site, such as:

the name (location, IP address) of the Configuration File Manager (the process which the JAM site contacts to learn the identities and addresses of its peers)
the port from which the Configuration File Manager accepts connections.
the name of the dataset -- the stem or root of all files composing the dataset, e.g., "hypo", "sj"
filename extensions for various files in the dataset, e.g., the original data set, training data (building data), data dictionary file, testing data, etc.
the site's unique nickname, e.g., "Marmalade", "Mango", "Dhurian"
the learning algorithm for base and meta learning (specified by the class name)
a URL for the directory containing site-specific images (when JAM site Mango exports a local classifier to another JAM site Dhurian's the remote site uses Mango's classifier gif when representing the classifier on its graphical interface.
the default values for meta-learning fold, meta-learning level and cross-validation fold
the default values (percentages) for splitting the original data set into training, testing and validation files.

Properties of a Configuration File

JAM requires configuration files to be named "DataSite.config".
Lines which begin with a pound sign (#) are treated as comment lines.
The variables found within a configuration file are of the form: VARIABLE_NAME=variable_value
A configuration file with missing variables will cause JAM to exit during the initialization phase with a message indicating why the configuration file is incomplete or contains invalid information.

Step by Step Instructions for Creating a Configuration File

Create a file called "DataSite.config"
Easy enough, right? Just make sure that this file exists in the directory containing the data files and that this directory is where you will invoke JAM.
Set the NICKNAME variable
Each JAM site is identifiable by a unique nickname. Most examples found in the documentation use names of fruit when referring to JAM sites, e.g., "blueberry", "marmalade", "mango".
```
# A unique nickname for this data site
##############################################
NICKNAME=Mango
```

Set the CONFIGFILE_HOST variable
This is the hostname (or IP address) of the machine running the Configuration File Manager (the process which the JAM site contacts to learn the identities and addresses of its peers)

# The host for Configuration File Manager
##############################################
CONFIGFILE_HOST=dynamo.cs.columbia.edu

# The port for Configuration File Manager
##############################################
CONFIGFILE_PORT=8175

Set the DATANAME variable
All local files will begin with the same DATANAME stem and, depending upon the file's contents, contain varying filename extensions. For example, the file containing building data for the second fold will be something like sj.2.bld.
```
# name of the dataset
DATANAME=sj
```

Set all the filename extension variables

Set the DATANAME_EXT variable

# filename extension for the original data set
DATANAME_EXT=dta

Set the CLASSIFIER_EXT variable

# filename extension for the classifier
CLASSIFIER_EXT=cls

Set the TRAINDATA_EXT variable

# filename extension for the training data
TRAINDATA_EXT=bld

Set the DICT_EXT variable

# filename extension for the data dictionary
DICT_EXT=attr

Set the DEC_EXT variable

# filename extension for the decision file -- the file that contains
# the output of the classifier
DEC_EXT=dec

Set the CLASSIFYDATA_EXT variable

# filename extension for the classifying data file, which is sometimes
# called the testing data file.
CLASSIFYDATA_EXT=tst

Set the METALEARNING_EXT variable

# filename extension for files associated with meta-learning
METALEARNING_EXT=m-c

Set the CLASSNAME_LEARNER variable
The CLASSNAME_LEARNER variable tells JAM which learning algorithm to use for its base learner. The value of this variable must be a full classname (i.e., including all packages). In addition, you must ensure that your CLASSPATH environment variable includes the path to the indicated class.
```
# The name of the class which is this datasite's local learning algorithm
##############################################
CLASSNAME_LEARNER=jam.algs.id3.ID3Learner
```
Set the CLASSNAME_METALEARNER variable
The CLASSNAME_LEARNER variable tells JAM which learning algorithm to use during meta-learning. The value of this variable must be a full classname (i.e., including all packages). In addition, you must ensure that your CLASSPATH environment variable includes the path to the indicated class.
```
# The name of the class which is this datasite's local meta learning algorithm
##############################################
CLASSNAME_METALEARNER=jam.algs.bayes.BayesLearner
```

Set the CVFOLD variable

#default value for cross-validation
CVFOLD=2

Set the MLFOLD variable

# default value for number of metalearning folds
MLFOLD=2

Set the MLLEVEL variable

# default value for meta-learning level
MLLEVEL=1

Set the TRAIN_SPLIT_PERC variable

# default value for splitting the original data set into training and
test files. It makes sense only when CVFOLD is set to 1
TRAIN_SPLIT_PERC=1

Set the LOCAL_SPLIT_PERC variable

# default value for splitting the traingin data set into training (for 
training the local base learner) and validation files (for generating the
meta level data, to be used by the meta learning algorithm). It makes 
sense only when MLFOLD is set to 1
LOCAL_SPLIT_PERC=1

Set the IMAGE_DIR_URL and NUM_DATA_IMAGES variables
The graphical JAM interface uses images when representing local and remote agents. The IMAGE_DIR_URL is a base url for the directory containing all the expected images. JAM will complain, but will not exit, if the directory does not contain one of the images.
Within the image directory, JAM expects to see a directory named data/ containing a number of image files which represent various attributes for the local dataset. Set the value of the NUM_DATA_IMAGES variable to be the number of image files within the data/ directory.
```
# The directory containing images for
# standard parts of the animation.
#
#  Note: Expected gifs in this directory include:
#    data.gif     ...representation of data
#    local.gif    ...local representation of this JAM site
#    engine.gif   ...representation of learning engine
#    default.gif  
#    metac.gif	  ...meta-classifier image
##############################################
IMAGE_DIR_URL=http://www.cs.columbia.edu/~andreas/JAMimages/mango/
NUM_DATA_IMAGES=6
```
[OPTIONAL] Set the DOT_SERVER_HOST and DOT_SERVER_PORT variables
The classifier diagramming tool can either construct graph visuals of classifiers locally or send the task to an outside process. The graph constructor executable is not ported to all platforms, therefore, for all unsupported platforms must utilize a DotServer when creating the classifier graph visual. JAM expects to find the graph constructor executable in the user's home directory under JAM_bin/, where architecture denotes the architecture of the local machine
The DOT_SERVER_HOST and DOT_SERVER_PORT variables specify the location of the DotServer process. Note: including DOT_SERVER_HOST and DOT_SERVER_PORT variables in the configuration file does not automatically cause JAM to contact the DotServer for graph construction -- JAM must be started with an option in order for the construction to be handled by the DotServer, the default is to have graph construction performed locally.
[OPTIONAL] Set the POST_PROCESSING_SCRIPT variable
The current version of JAM allows the user to use his/her own program/script to process the results generated by the base and meta-classifiers. By defining the name of the ``result processing program'' the user directs JAM will invoke it after the end of batcj predicting a test file. (Batch prediction creates a .results file with predictions and the post processing script will output the results in a .conclusion file

Columbia University, September 1997. Last Modified: June 5, 1998

andreas@cs.columbia.edu