COMS 6998 Computational Photography

#1-0001

VidoopCAPTCHA is a verification solution that uses images of objects, animals, people or landscapes, instead of distorted text, to distinguish a human from a computer program. By verifying that users are human, the site and users are protected against malicious bot attacks. VidoopCAPTCHA is more intuitive for the user compared to the more traditional text based CAPTCHAs. It then presents itself as the solution to the current captcha problems.

As shown in the Figure above, a Vidoop challenge image consists in a combination of pictures representing different categories. Each picture is associated with a letter which is embedded in it. In order to pass the challenge, the user is asked to report the letters corresponding to a list of required categories. The robustness of the approach relies in the fact that object recognition is a straightforward and fast to solve task for humans, whereas for a computer it is a fundamentally hard problem. In fact, it has represented for many years and still represents a topic of active research in computer vision. What the authors underestimate, though, is that since a bot can try to access a service thousand of times in a day, recognition rates which are considered quite low by the object recognition community (40% or 50%), still would allow automatic attacks to services protected by the image captcha to be considered fully successful.

The core idea of the project consists in trying to break an image based captcha, and in particular VidoopCAPTCHA, following in the line of work initiated by Mori and Malik. The objective of this system is to show that image-based captchas, and in particular the Vidoop

one, are not as secure as their authors claim. This automatically leads to insecurity for the different applications using the image captchas. We chose this idea in order to show our concerns in today’s world where the security methods developed to preserve confidentiality in online systems, of which image based captcha represent of the latest developments, are not only insecure but are prone to attacks by hackers with high success rates.

Here is our project proposal.

Here is our Intermediate Milestone Report.

First Step: Data acquisition

We wrote a Perl script to download 200 Vidoop challenges from their website. The images can be found here, together with .txt files containing the correspondent categories required by the challenge and manually annotated ground truth letters that actually solve the test, and the results of the split and letter detection algorithm,.

We discovered that only 26 categories are used in the challenges. Their distribution can be observed in the graph of Figure 1.

#1-0001

Figure 1 : Distribution of 26 categories across 600 requests in 200 Vidoop challenges

We also wrote another Perl script to download images from Flickr for every category, in order to use them as training data. In this context, we decided to download 500 images per concept, a number large enough to train a fairly robust classifier, but small enough to prevent too many noisy examples to be in the training set. In fact, downloading images from Flickr allows to automatically obtain a large scale of data, but many examples might not be relevant to the given query. Flickr’s query system relies on users tagging or other text labeling of the images, rather than on their actual content, therefore mislabeling by users can lead to errors, which increase as we proceed to lower rankings in the returned list of results.

Test images preprocessing

The goal of this step is to split each challenge image into the correct subimages, and then localize and extract the circular region containing the character within each subimage. The split algorithm we use is based on localizing vertical and horizontal lines containing the maximum number of edges in the edge image obtained by applying a Laplacian of Gaussian filter to the original challenge image. Once the image had been split into the subimages, a generalized Hough transform (we found the code here) is computed on each subimage to detect circular regions. The circular region which is detected in most of the subimages in approximately the same position and with same radius is kept to be the character’s region. Finally, the rectangle with equal sides of length l = r/sqrt(2) inscribed in the localized circle of radius r is the final character region, which is thresholded into a binary representation. The algorithm, while being simple and a little bit as hoc, is quite effective. In fact, it splits and segments subimages and text regions with 100% accuracy .The preprocessing code is here. Figure 2 presents an example of the processing chain for Challenge1.

#1-0001

Figure 2 : Preprocessing chain: a) original test image, b) LoG based egde image, c) split and circle detection result.

Features extraction

We are extracting color histogram, edge histogram and color moments features to train and test classifers as in Assignment 2. We are still in the process of extracting and testing the results, which will be uploaded soon.

Final Report and Presentation

The final presentation and final report can be found here. The folder containing the code is at this link.

COMS 6998 Computational Photography

Michele Merler(mm3233) and Jacquilene Jacob(jj2442)