Gaze Locking: Passive Eye Contact Detection
for Human–Object Interaction

Eye contact plays a crucial role in our everyday social interactions. The ability of a device to reliably detect when a person is looking at it can lead to powerful human–object interfaces. Today, most gaze-based interactive systems rely on gaze tracking technology. Unfortunately, current gaze tracking techniques require active infrared illumination, calibration, or are sensitive to distance and pose. In this work, we propose a different solution—a passive, appearance-based approach for sensing eye contact in an image. By focusing on gaze locking rather than gaze tracking, we exploit the special appearance of direct eye gaze, achieving a Matthews correlation coefficient (MCC) of over 0.83 at long distances (up to 18 m) and large pose variations (up to ±30° of head yaw rotation) using a very basic classifier and without calibration. To train our detector, we also created a large publicly available gaze data set: 5,880 images of 56 people over varying gaze directions and head poses. We demonstrate how our method facilitates human–object interaction (top-left), user analytics (top-right), image filtering (bottom-left), and gaze-triggered photography (bottom-right).

Publications

"Gaze Locking: Passive Eye Contact Detection for Human?Object Interaction,"
B.A. Smith, Q. Yin, S.K. Feiner and S.K. Nayar,
ACM Symposium on User Interface Software and Technology (UIST),
pp. 271-280, Oct. 2013.
[PDF] [bib] [©]

Images

  Gaze Locking in People:

(a) People are relatively accurate at sensing eye contact, even when the person gazing (i.e., the gazer) is wearing prescription glasses. At distances of 18 m, gazees still achieve MCCs of over 0.2 if the gazer is not wearing glasses. Here, the gazer is at a frontal (0°) head pose. (b) The gazee's accuracy decreases roughly linearly over distance regardless of the gazer's (horizontal) head pose. Head poses that are more off-center (such as ±30°) have slightly lower MCCs. (c) The gazees are least accurate when the gazer is actually looking at them (the 0° case)—that is, the false negative rate is higher than the false positive rate. Interestingly, if the gazer is looking away, the gazee is more accurate when he or she can only see one of the gazer's eyes (the blue line is not strictly above the red and green lines).

     
  Gaze Locking Detector Pipeline:

Our gaze locking detector is comprised of three broad phases, shown here in different colors. In the first phase, we locate the eyes in an image and transform them into a standard coordinate frame. In the second phase, we mask out the eyes' surroundings and assemble pixel-wise features from the eyes' appearance. Finally, we project these features into a low-dimensional space, then feed them into a binary classifier to determine whether the face is gaze locking or not.

     
  Rectified Features and Failure Cases:

(a) Examples of rectified and masked features. Each eye has been transformed to a 48x36 px coordinate frame. The crosshairs signify eye corners detected in the first phase. We mask each eye with a fixed-size ellipse whose shape was optimized offline for accuracy. (b) Two failure cases: strong highlights on glasses (top) and low contrast (bottom).

     
  Gaze Locking Detector Performance:

In these tests, we downsampled our detector's test images to match the resolution seen by the human fovea at the respective distances. (a) Our detector achieves MCCs of over 0.83 at a distance of 18 m, significantly outperforming humans' accuracy. The detector's accuracy is fairly constant over distance because our method uses features that are of very low resolution. The line representing human performance is an aggregation of the lines from Figure (a) of "Gaze Locking in People" above. (b) Our detector's accuracy is also fairly constant over a variety of (horizontal) head poses. (c) As with human vision, our detector's accuracy is worst when people are looking at or very close to the camera. Our detector significantly outperforms human vision nonetheless.

     
  Comparison with an Active System:

Here we compare our sample detector with an eyebox2, which implements an active infrared approach to eye contact detection, in both Normal (6 m) and Close Range (2 m) modes. Though passive, our detector is more accurate than the eyebox2. The eyebox2's Normal mode seems to be tuned toward reducing false positives, and its Close Range mode seems to be tuned toward reducing false negatives.

     
  Application 1: Human–Object Interaction:

Our gaze locking approach allows people to interact with objects just by looking at them. In this proof of concept, we process the videos from the embedded cameras of three iPads to sense when the iPads are being looked at. Here, the woman is looking at the iPad in the middle. Since the iPads' cameras are on their extreme left, she was instructed to look at the iPads' left halves. Our accompanying video shows our detector's output on the actual video feeds.

     
  Application 2: User Analytics:

Two ordinary webcams are placed above two ads for the same product. By counting the number of times each advertisement is viewed, we can gauge which one is more effective. The counts incremented when the viewers looked at the ads' top halves. Our accompanying video shows our detector's output on the actual video feeds.

     
  Application 3: Image Search Filter:

Our approach is completely appearance-based and can be applied to any image, including existing images such as ones from the Web. Hence, we can sort these images (A–D) by degree of eye contact to quickly find one where everyone is looking at the camera. These are actual decisions made by our detector.

     
  Application 4: Gaze-Triggered Photography:

By incorporating a gaze locking detector in a consumer-level camera, the camera could automatically take a picture when the entire group is looking straight back, allowing the photographer to join the group and still capture a perfect photo. Our accompanying video shows our detector's output on the camera's feed.

     

Video

  UIST 2013 Video:

This video is the supplemental video for our UIST 2013 paper. It contains a brief summary of our approach and demonstrates a few of the applications that gaze locking facilitates. It also shows our detector's output on the feeds and images from Applications 1–4 above.

     

Database

  Columbia Gaze Data Set:

We have made our data set publicly available for use by researchers. This data set includes 56 people and 5,880 images over varying gaze directions and head poses.

     

The World In An Eye