3D Scene Reconstruction and Object Recognition

Contact: Michael Reed <m-reed@cs.columbia.edu>

Robot-mounted laser scanning a part.

Generating models of scenes has been one of the central goals in computer vision. It is usually done in a piece-wise fashion by identifying various parts of the scene and from those parts constructing a representation of the whole. In contrast, what we present here is a method that leads to the creation of a model of an entire scene, which may be just one object or several. This method involves constructing solid models by scanning a scene with a laser rangefinder. Distinct models are created from each of multiple views, and the intersection of these models forms a solid representation of the scene from which a CAD model is taken.

In the scanning phase, the laser beam is swept over the object to be imaged, generating an array of depth values. The size of the array is determined by the number of depth values in the X and Y directions and the dynamic range of the sensor (i.e. number of bits of digitization). Scanning is by no means a perfect process. Laser rangefinders have noise that is affected by the surface material, color, and geometry. Further, the noise is not normally distributed, and is therefore difficult to remove.

To produce accurate estimates, the data must be smoothed by applying a filter or using an estimation technique. We have found that simple preprocessing with a median filter, followed by an adaptive planar fit works reasonably well for polyhedral objects, or objects where the curvature is not in the X-Y plane of the imager.

The data collected is dense but contains little information about the complexity and topology of the scanned object. In order to build a high-level model, it is necessary to group adjacent point data into regions of like surface. This segmentation process involves labeling those parts of the image that lie on a common geometric entity. We have found a combination of region growing and surface normal analysis facilitates rapid segmentation, while still allowing one to adjust the parameters of the segmentation. These parameters include the minimum size of each segment, maximum deviation of surface normals between adjacent segments, maximum error of depth value fit to surface patch, maximum error of entire surface patch fit, and convexity of surface path.

Once an incomplete model has been built for each of the range images, the views must be merged in some way to generate a more accurate representation of the scene. The desired features for a representation are that it be effective at modeling the segmented data of each view and be able to quickly merge the different views into a single, accurate representation of the data. Such a representation is the Binary Space Partitioning Tree, or BSPT (See H. Fuchs, Z. Kedem, and B. Naylor, "On Visible Surface Generation by A Priori Tree Structures", in Computer Graphics, 14(3), June 1990). The BSPT has unique advantages as an intermediate representation for the different views. Its tree structure allows very efficient algorithms to be developed, it is compact, and it is numerically robust. The ability to quickly merge 3-D data represented as a BSPT is at the heart of our proposed work. We will be able to take a number of views of an object with our laser scanner, segment the data into planar regions, and from this build a BSPT for each view. Each of these views represents decomposition of 3-D space, and by quickly merging the trees, we can generate a new composite tree that represents the entire object to be scanned.

In our experimental system, a Servo-Robot laser rangefinder is manipulated by a Puma 560 robot (see the image above). The rangefinder unit scans a laser beam in an arc, so that a single scan line is recovered. The Puma 560 moves the scanning head orthogonal to the scanning plane, thereby allowing the acquisition of a 2-D array of depth values, or a range image, of the part being scanned. The resolution of the laser is adjustable, as is the translation of the robot arm, allowing from very coarse to very fine data to be acquired.

Figure 1. Raw laser range image and corresponding segmented image.

Figure 2. Simulation of model and the actual occluding volume computed from its faces.

The laser acquires an image of the part by scanning one side. A segmentation algorithm is used to group the data and determine polygonal surface regions. The raw scanned data and segmented image for one view are shown in figure 1. The polygonal surface regions are used to build a BSPT representation of the faces seen by the scanner. A visibility algorithm is then run to determine what regions are occluded by these faces. In figure 2, we show a range image, a simulated model of the faces developed from that image, and the output of a visibility algorithm on this simulated model. The occluded areas can be identified and used to plan the next scan. Recall that the volume shown is the region which the scanner is unable to image due to occlusion. At this point, the sensor planning algorithm would compute a new area of the object to be scanned such that the scanner explores as much of the occluded region as it can. The process would then be repeated for the new view. Two other simulated views, chosen by hand, are shown in figures 3 and 4. The final step is to take each of the BSPTs and merge them together, forming a complete solid model such as the model of the object shown in figure 5.

Figure 3. Raw laser range image, simulation, and generated occluding volume from second view.

Figure 4. Raw laser range image, simulation, and generated occluding volume from third view.

Figure 5. Simulation of object model generated by intersecting above occluding volumes.

Return to the Robotics Lab home page