In this project I have investigated different ways of detecting the presence of a user. I have tried a few algorithms including background removal, color recognition, color derivative recognition and motion detection. In the end, the best-performing algorithm used a mixture of hue, saturation and texture to detect presence.
The problem of detecting human presence is very complex. Different researchers tried various techniques to detect presence. Some such techniques include color detection, color detection using principal component analysis, motion detection or a mixture of the above. There have been some very interesting ideas proposed to detect faces as in the detection of eye blinking to narrow down where the face is. More creative ideas included trying to track the facial features like the mouth, nose and eyes. Other more complex models included neural net approaches to have machines learn how to recognize faces through trials. It doesn't seem like there is a good way to do it as all methods seem to produce false positives. I think that the best algorithm is one a combination of different algorithms work to check each other to avoid the false positives.
I have experimented with different color algos including detecting excess skin redness as well as some strategies which didn't pan out so for the sake of brevity they were removed from the final project submission. I tried a motion detection algo to detect presence but it was very error-prone. The basic idea was to compare a few frames to see if there was movement but checking whether something moved is very problematic because (a) lights on devices blink, (b) color changes in the image constantly (c) a tree behind the window would move triggering the algo, (d) the wind from the central air would move papers around which would also cause false negatives, etc. Therefore, I started concentrating more on focusing if a person was present rather than if something moved.
I am going to focus more on the algorithms that showed interesting results. First I will describe the application that is used to run the algorithms and delve into the algorithms that were used. I don't think there is a need to revisit algorithms that didn't produce any significant results so I will omit those details from the writeup.
The application starts up asking the user to select which video source the user would like to use. The application looks like the following:
Once the user selects a video source the application will popup 2 windows. The window on the left shows the camera video in real time. The window on the right shows the snapped frame. The program will snap every 20th frame (to allow enough processing speed) and capture it on the right side. The application would look as following:
The user can then select an algorithm to run. Each algorithm that is created for this platform needs to implement an interface called the "PDAlgoIntf.java" which makes it easy to add new algorithms. Each algorithm can then return a "PDResults" object which shows the output of the algorithms analysis. Currently there are 2 algorithms that passed the final testing and produced very interesting results. Each algo uses a sensitivity to determine if presence is detected and once presence is detected the middle box will display "Presence Detected" in blue.
The downside of this algorithm is that since it is background-based it might not work in a chameleon scenario where the background matches the person then it would be hard to separate the two but I would guess many algorithms would get confused in a scenario like that.
Another algorithm that the user can select is called the "color detection algo". This algorithm started as a simple excess-red detection algorithm and I kept evolving it. At first it didn't work very well and would produce a lot of false negatives. The next attempt was to use a more advanced method of color detection so I looked around on the web. I found a paper from Jay Kapur http://www.geocities.com/jaykapur/face.html which looked very interesting so I tried to follow the algorithms that are proposed in that paper (diverging where needed). The idea was to create three images from the main image i.e. to convert the RGB image to log-opponent (IRgBy) as follows:
Rg = L(R)-L(G)
By = L(B)-[L(G)+L(R)]/2
Then I create a texture amplitude map; from the paper: "A texture amplitude map is used to find regions of low texture information. Skin in images tends to have very smooth texture and so one of the constraints on detecting skin regions is to select only those regions with little texture" (http://www.geocities.com/jaykapur/face.html). In order to create the amplitude map I used a median filter to filter out the noise from the I pixel array thereby obtaining a filtered array. I then subtract the two arrays and take the absolute value of the result. This generated a texture map.
Then I calculated the saturation and hue of the image using the following formulas:
hue = (atan2(Rg,By))
saturation = sqrt(Rg2+By2)
I tuned the parameters from the ones that are used in the paper to detect skin to the following values. If skin is detected the pixel's value would be "on", otherwise "off":
(1) texture < 4.5, 120 < hue < 160, 10 < saturation < 60
(2) texture < 4.5, 150 < hue < 180, 20 < saturation < 80
This effectively will transform the image into an on/off image (blue/black in my case). I then dilate the resulting binary image using a dilation filter (a c# example explaining how a dilation filter works can be found here: http://www.codersource.net/csharp_image_processing_dilation.aspx). I wrote a custom version of the dilation filter that would dilate the image until no more dilation is needed.
This algorithm worked fairly well but depended on the fact that the background shouldn't be skin colored which was a problem in my case as my walls are peach colored. I used the background elimination code from my earlier algorithm to remove the background before processing the image using the color detection algorithm. This improved the accuracy of the results tremendously. By removing the background the image of the person in front of the computer is much easier to detect. The algorithm reports presence when AT LEAST 1 out of every 100 pixels is positively indicating presence (less pixels needed than above algorithm to show positive presence). Here is a typical analysis:
I have recorded the program in action and put it up online. You can view the 2 algos in action on YouTube (the videos were rather large). You need to paste the following links into your browser)
A number of people have created algorithms for presence detection based on different ideas. I tried to combine some of the ideas that I had (background filtering) with some of the ideas that other people have come up with.
Lines of Code: Not Counted
The main class is called "PresenceDetection".
Please see the USAGE document here
See related work section