SIFT in RGB-D (Object recognition)

SIFT in RGB-D (Object recognition)

Marek Jakab

In this example we focus on enhancing the current SIFT descriptor vector with additional two dimensions using depth map information obtained from kinect device. Depth map is used for object segmentation (see: http://vgg.fiit.stuba.sk/2013-07/object-segmentation/) as well to compute standard deviation and the difference of minimal and maximal distance from surface around each of detected keypoints. Those two metrics are used to enhance SIFT descriptor.

Functions used: FeatureDetector::detect, DescriptorExtractor::compute, RangeImage∷calculate3DPoint

The process

For extracting normal vector and compute mentioned metrics from the keypoint we use OpenCV and PCL library. We are performing selected steps:

  1. Perform SIFT keypoint localization at selected image & mask
  2. Extract SIFT descriptors
    // Detect features and extract descriptors from object intensity image.
    if (siftGpu.empty())
    {
    	featureDetector->detect(intensityImage, objectKeypoints, mask);
    	descriptorExtractor->compute(intensityImage, objectKeypoints, objectDescriptors);
    }
    else
    {
    	runSiftGpu(siftGpu, maskedIntensityImage, objectKeypoints, objectDescriptors, mask);
    }
    
    
  3. For each descriptor
    1. From surface around keypoint position:
      1. Compute standard deviation
      2. Compute difference of minimal and maximal distances (based on normal vector)
    2. Append new information to current descriptor vector
    for (int i = 0; i < keypoints.size(); ++i)
    {
    	if (!rangeImage.isValid((int)keypoints[i].x, (int)keypoints[i].y))
    	{
    		setNullDescriptor(descriptor);
    		continue;
    	}
    	rangeImage.calculate3DPoint(keypoints[i].x, keypoints[i].y, point_in_image.range, keypointPosition);
    	sufraceSegmentPixels = rangeImage.getInterpolatedSurfaceProjection(transformation, segmentPixelSize, segmentWorldSize);
    	rangeImage.getNormal((int)keypoints[i].x, (int)keypoints[i].y, 5, normal);
    	for (int j = 0; j < segmentPixelsTotal; ++j)
    	{
    		if (!pcl_isfinite(sufraceSegmentPixels[j]))
    			sufraceSegmentPixels[j] = maxDistance;
    	}
    	cv::Mat surfaceSegment(segmentPixelSize, segmentPixelSize, CV_32FC1, (void *)sufraceSegmentPixels);
    	extractDescriptor(surfaceSegment, descriptor);
    }
    
    
    void DepthDescriptor::extractDescriptor(const cv::Mat &segmentSurface, float *descriptor)
    {
    	cv::Scalar mean;
    	cv::Scalar standardDeviation;
    	meanStdDev(segmentSurface, mean, standardDeviation);
    
    	double min, max;
    	minMaxLoc(segmentSurface, &min, &max);
    
    	descriptor[0] = float(standardDeviation[0]);
    	descriptor[1] = float(max - min);
    }
    
    

Inputs

jakab_input

The color image

jakab_mask

The mask from segmented object.

Output

To be able to enhance SIFT descriptor and still provide good matching results, we need to evaluate the precision of selected metrics. We have chosen to visualize the normal vectors computed from the surface around keypoints.

jakab_output

Normal vector visualisation