-------------------------------------------------------------------------
-------------------------------------------------------------------------
By Stuart Andrews (stu@cs.brown.edu)
Brown University, Providence, RI
Last modified: August 2003

I put together this document from a collection of notes made over the past
year, and do not claim that it is entirely consistent. 

I am grateful to the UC Berkeley group that developed the Blobworld system 
that was used to segment images.

Reference:

Support Vector Machines for Multiple-Instance Learning 
 Stuart Andrews, Ioannis Tsochantaridis & Thomas Hofmann 
  Advances in Neural Information Processing Systems (NIPS*15), 2002

-------------------------------------------------------------------------
-------------------------------------------------------------------------

    Files:
    --------
    o README
    o blobworld_2_mil.m
    o generate_bag_set.m
    o generate_one_instance.m
    o max_index.m
    o normalize_bag_set.m
    o write_bag_set.m


    This document includes sections on:
    ---------
    o generation of the data sets
    o blob representation
    o list of features used in data sets
    o some more details
    o appendix: blobworld image, and blob structures

-------------------------------------
Generation of blobworld data-sets
-------------------------------------

We used the blobworld representation to generate several binary MIL image
classification problems.  The data-sets are called CONCEPT.mil, where CONCEPT
is derived from the labelings given in the Corel image database.  Negative
patterns were chosen randomly from the remaining non-CONCEPT images.

A number (_#.mil) may be included to indicate which subset of the fourier 
coefficients in the blobworld representation were included in the MIL file.
See the description on fourier coefficients below.

The matlab code that generates these files is poorly documented. This
description assumes that the segmentation of each image via Blobworld has been
created and written to a <IMGNAME>_bd.mat file.

The script blobworld_2_mil.m may then be used to create the MIL files.  

    blobworld_2_mil(positives_file, negatives_file, mil_file, shape_type)

The positives/negatives files should contain the names (one per line) of the
blobworld output files, comprising the set of positive and negative images for
the MIL data set.  The shape_type argument is an integer 0-4 specifying which,
if any, shape parameters are to be included as features for each blob.  [After
some initial experiments, we chose not to use any shape information (shape_type
= 0)].

Some auxillary functions are included in the directory.





------------
Representation
------------
    Here's a brief description of the blobworld representation that we started
    with.  Quotes are taken from publications describing the Blobworld system.

    They say: "In this work, we introduce a novel method of scale selection
    which works in tandem with a fairly simple but informative set of texture
    descriptors.  The scale selection methods is based on edge/bar polarity
    stabilization, and the texture descriptors arise from the windowed second
    moment matrix.  Both are derived from the gradient of the image intensity,
    which we denote by grad(I).  We compute grad(I) using the first difference
    approximation along each dimension.  ... "

        "... we define the scale to be the width of the Gaussian window within
        which the gradient vectors of the image are pooled.  The second moment
        matrix for the vecotrs within this window, computed about each pixel in
        the image, can be approx. using M(x,y) = G(x,y) * (grad(I))(grad(I))'
        where G(x,y) is a separable binomial approx. to a Gaussian with
        variance sigma squared."

    Okay, I put that there to get started ... the M(x,y) is +ve semi-definite
    and we use the eigenvalues lambda_i below.   Using the polarity (see below)
    they choose the scale at which the remaining pixel statistics are collected.
    Here are the values that comprise the blobworld data structure (that we
    used to make our MIL data set).
    

------------
Features Definitions
------------
    - contrast  (real: [1x1])
        con = 2*sqrt(lambda_1 + lambda_2)
        ... where lambda_1 and lambda_2 are the eigenvalues

    - anisotropy  (real: [1x1])
        aniso = 1 - (lamda_2 / lamda_1)

    - avg. blob location (real: [2x1])
        x = mean x location for blob
        y = mean y location for blob

    - number of pixels (integer [1x1])
        num_pixels per blob

    - fourier coeff's (real)    (4 different versions)
        AB  (??)                [1x1]   "_0.mil"
        A   (??)                [1x1]   "_1.mil"
        fourier mean (x,y)      [2x1]   "_2.mil"
        fourier ellipse coeff's [20x1]  "_4.mil"
        
    - mean color (real [3x1])
        from L*a*b color space

    - color histogram (integer approx. [215x1])
        - a partial histogram covering the visible portions of the L*a*b color
          space (i.e. only the visible bins (about 215) from [5x10x10] binning)


----------
Details
----------
    1. Polarity?
        pol = abs(Eplus - Eminus)/(Eplus + Eminus)

        ... where Eplus = sum_{x,y}  G_{sigma}(x,y) [dot(gradient(Img),n)]+
        and   Eminus = sum_{x,y} G_{sigma}(x,y)     [dot(gradient(Img),n)]-

        ... where n is perpendicular to the dominant orientation in the
        neighbourhood and the notation [.]+ / [.]- indicates the rectified
        positive and negative parts of their arguments
        
        In english, Eplus measures "how many gradient vectors in a Gaussian
        defined window are on the positive side of the dominant orientation"
        (similarly for Eminus).

    2. During segmentation, they only use six color/texture values plus the
    location.

        Their sampled pixel statistics are: 
            (L, a, b, con*aniso, con*pol, con, x, y)

        ... where polarity is defined below.  They modulate the aniso/pol by the
        contrast because those values have no meaning in areas of no contrast.

    3. They did not record the polarity in the blobworld data structure, so we
        are not using it.  

    4. What is L*a*b? 
        Our retinas first receive three color stimuli related to red, green and
        blue light rays. As the stimuli are processed, three sensations are
        generated: red-green, yellow-blue and brightness.  Based on these
        sensations, CIE (Commission Internationale de l'Eclairage) developed a
        complementary color system called "CIELab" in 1976.  CIELab (or "Lab")
        is a global standard, and all colors perceived by the human eye lie
        within its color space. Lab is independent of device-dependent color
        systems such as RGB and CMYK. 




----------
Blobworld representation
----------

The Blobworld output has the following structure for each image.
The running times for images of size 192x120 were 5 minutes.

% ---------------------------------------------------
image = 

                  CC: [192x128 double]
                Xdim: 128
                Ydim: 192
                  bd: [1x3 struct]
            filename: '/pro/webagents/data/Corel/CD4/103000/103000.JPG'
             support: [192x128 double]
    whole_aniso_hist: [1x21 double]
    whole_color_hist: [1x218 double]
      whole_con_hist: [1x21 double]


% ---------------------------------------------------

bd(1) = 

                           con: 0.0059
                         aniso: 0.0050
                        texbin: 1
                             x: 66.9169
                             y: 106.1740
                        locbin: 1155
            shape_coeffs_saved: 20
                shape_oriented: [4x20 double]
             shape_nonoriented: [4x20 double]
       shape_oriented_size_inv: [4x20 double]
    shape_nonoriented_size_inv: [4x20 double]
                            AB: 5.9534e+03
                             B: 0.6698
                  fourier_mean: [63.1882 93.2629]
                         ABbin: 2
                          Bbin: 4
                        pixels: 11000
                        pixbin: 3
                    mean_color: [27.1439 -2.2117 -2.2129]
                    color_hist: [1x218 double]



