Sonya Allin and Melissa Holcombe
CS4721 Advanced Intelligent Systems
Project 2, part 2
March 9, 1999


NEURAL-NET HAND SIGNAL RECOGNIZER

I. OVERVIEW, WHAT WAS DONE

The goal of our project was to to see 1) if we could get the neural net package to perform on a new set of data and 2) to investigate the effect that varying the number of hidden nodes has on network performance.

For the input, we took pictures of our left and right hands forming the numbers 1 to 10 in American Sign Language. We used the data for two different neural net tasks: left/right hand recognition and number recognition.

II. HOW WE DID IT

We used SunVideo to capture the images as grayscale raster files, with the pixel dimensions set to 40x30. We converted the raster files to Portable Graymaps with xv, and used a perl script to remove comments (so they could be read by our versions of facetrain) and rename files.

click here to see some sample images.

The data was then fed into two distinct sets of neural networks. The first set was made up of networks designed to recognize the specific number being signed in the picture. The second set was designed to recognize if the person in the picture was signi ng with their left or right hand. The first set included networks with 10, 15 and 20 hidden nodes while the second set featured networks with 2, 4 and 10 hidden nodes.

Training, validation, and test sets were randomly generated with another perl script. The training set had 82 images, the validation set had 41, and the test set had 62.

Finally, results were fed into Mathematica with the help of perl script number 3. Graphs of the results were then compared, along with pictures of hidden nodes, yielding the following results ....

III. THE RESULTS

1 in 10 recognizer:

Note: all graphs for the 1 in 10 regonizer reflect networks that have been initialized to zero. Runs were conducted, however, both for networks initialized to zero and networks initialized randomly and both sets of data are included in the analysis .

left - right recognizer:

V. ANALYSIS OF RESULTS

Re: the 1 in 10 recognizer.

Graph Based Analysis:

  • In all six tests on the 1-in-10 net, there was a wide discrepancy between the training set results and the test and validation set results. The best results for the training set occurred with 15 hidden nodes and weights initialized randomly. Here, th e accuracy peaked at 71%. The highest accuracy on the validation and test sets (39% and 30%, respectively) occurred with 20 hidden nodes and random initialization. Both indicate 15 and 20 node networks to be superior to 10 node networks for our specific 1 in 10 task.
  • Furthermore, the accuracy of the 15 and 20 node networks on the test and validation sets was found to be somewhat less sporadic that of the ten node network (i.e. the rate of performance change was slightly slower, and more constant). This may indica te that 15 and 20 node networks yield more consistent results on test and validation sets through extended training. The 10 node network, by contrast, is volatile and overfits the data with frequency.
  • All three graphs exhibit local maxima at about round 90. As the networks that feature graphs were initialized to zero, we think this reflects some strage joint overfitting that, at round 90, was discovered by all networks and reversed. Readjustments can be seen to occur shortly after round 90 and are reflected in both test and validation sets.
  • The results do not suggest that random initialization is better or worse than initializing to 0.

Image base analysis:

  • Unlike the results from the left-right recognizer, for the 1-in-10 the background of the hidden node images is neither especially dark nor especially light, which would suggest that the weights for those pixels are close to zero (as a pixel value of 255 reflects a very positive weight, 0 a very negative one). The hidden nodes have bright and dark spots in the region that corresponds to the fingers. This arrangement of weights makes sense because the background is irrelevant to the target values for the 1 in ten recognizer -- instead, the target values are determined by the position of the fingers.

    Here is an example:


    See the central dark and light spots?

Re: the left-right recognizer.

Graph based analysis:

  • The left hand vs. right hand recognizer achieved 100% accuracy after 3 or fewer epochs with 2, 4, and 10 hidden nodes. Some elements which may have factored in to the ease of the problem includes the background against which we photographed hands. All right images share the same background as do left images.

Image based analysis:

  • In all three cases the images of the hidden nodes indicate that the weights take the background at least as much into account as the hand.

    Here is an example:


    See how there are very negative wieghtings in this image both in the foreground and the background? See how it differs from the image above? In the image above, the most negative and positive spots are in the center of the hand and there are no extremes in the background. Here, however, the background take on as much importance as the hand...

VI. POSSIBLE IMPROVEMENTS

We probably would have achieved better accuracy with the 1-in-10 recognizer if we had been more careful to position our hands in the same part of the frame for each image. The neural net is not able to distinguish how many fingers are held up by counting them the way a human would--in this implementation, at least, it can only examine how pixels are aligned with pixel patterns determined by previous examples. Because fingers are skinny, it is hard for the system to distinguish between two fingers and thre e. Furthermore, if in one picture of the number 3 the hand is a mere finger - width to the right of where it appears in another image, this will have a strong impact on the error.