Posted on

Smile detection

Jan Podmajersky

Smile detection is a popular feature of today’s photo cameras. It is not implemented in all cameras, as a popular face detection, because it is more complicated to implement. This project shows a basic algorihtm in the topic. It may be used but few improvements are necessary. Sobel filter and thresholding are used. There is a mask which is compared to every filtered image from a webcam. If the images are more than 60% equal, smile is detected.
Used Functions detectMultiScale, Sobel, medianBlur, threshold, dilate, bitwise_and

The process

  1. convert image from camera to gray scale
    cvtColor( frame, frame_gray, CV_BGR2GRAY );
    
  2. face detection using Haar cascade
    face_cascade.detectMultiScale( frame_gray, faces, 1.3, 4, CV_HAAR_DO_CANNY_PRUNING, Size(50, 50) );
    
  3. adjust size of image just to the detected face
  4. cut only one third of the face, where mouth are always located
    face = frame_gray( cv::Rect(faces[i].x, faces[i].y + 2 * faces[i].height/3, faces[i].width, faces[i].height/3) );
    
  5. horizontal sobel filter
    Sobel( face, grad_y, ddepth, 0, 1, 7, scale, delta, BORDER_DEFAULT );
    addWeighted( abs_grad_y, 0.9, abs_grad_y, 0.9, 0, output );
    
  6. Median blur
    medianBlur(output, detected_edges, 5);
    
  7. threshold the image
    threshold(detected_edges, detected_edges, 220, 255, CV_THRESH_BINARY);
    
  8. dilate small parts
    dilate(detected_edges, detected_edges, element);
    
  9. logical and the image and mask image
    bitwise_and(detected_edges,maskImage,result);
    
  10. detect smile
    if the images are 60% equal there is a smile
podmajersky_input
input
podmajersky_sobel
horizontal Sobel filter
Podmajersky_smile
masked image
podmajersky_output
output
Posted on

SIFT in RGB-D (Object recognition)

Marek Jakab

In this example we focus on enhancing the current SIFT descriptor vector with additional two dimensions using depth map information obtained from kinect device. Depth map is used for object segmentation (see: http://vgg.fiit.stuba.sk/2013-07/object-segmentation/) as well to compute standard deviation and the difference of minimal and maximal distance from surface around each of detected keypoints. Those two metrics are used to enhance SIFT descriptor.

Functions used: FeatureDetector::detect, DescriptorExtractor::compute, RangeImage∷calculate3DPoint

The process

For extracting normal vector and compute mentioned metrics from the keypoint we use OpenCV and PCL library. We are performing selected steps:

  1. Perform SIFT keypoint localization at selected image & mask
  2. Extract SIFT descriptors
    // Detect features and extract descriptors from object intensity image.
    if (siftGpu.empty())
    {
    	featureDetector->detect(intensityImage, objectKeypoints, mask);
    	descriptorExtractor->compute(intensityImage, objectKeypoints, objectDescriptors);
    }
    else
    {
    	runSiftGpu(siftGpu, maskedIntensityImage, objectKeypoints, objectDescriptors, mask);
    }
    
    
  3. For each descriptor
    1. From surface around keypoint position:
      1. Compute standard deviation
      2. Compute difference of minimal and maximal distances (based on normal vector)
    2. Append new information to current descriptor vector
    for (int i = 0; i < keypoints.size(); ++i)
    {
    	if (!rangeImage.isValid((int)keypoints[i].x, (int)keypoints[i].y))
    	{
    		setNullDescriptor(descriptor);
    		continue;
    	}
    	rangeImage.calculate3DPoint(keypoints[i].x, keypoints[i].y, point_in_image.range, keypointPosition);
    	sufraceSegmentPixels = rangeImage.getInterpolatedSurfaceProjection(transformation, segmentPixelSize, segmentWorldSize);
    	rangeImage.getNormal((int)keypoints[i].x, (int)keypoints[i].y, 5, normal);
    	for (int j = 0; j < segmentPixelsTotal; ++j)
    	{
    		if (!pcl_isfinite(sufraceSegmentPixels[j]))
    			sufraceSegmentPixels[j] = maxDistance;
    	}
    	cv::Mat surfaceSegment(segmentPixelSize, segmentPixelSize, CV_32FC1, (void *)sufraceSegmentPixels);
    	extractDescriptor(surfaceSegment, descriptor);
    }
    
    
    void DepthDescriptor::extractDescriptor(const cv::Mat &segmentSurface, float *descriptor)
    {
    	cv::Scalar mean;
    	cv::Scalar standardDeviation;
    	meanStdDev(segmentSurface, mean, standardDeviation);
    
    	double min, max;
    	minMaxLoc(segmentSurface, &min, &max);
    
    	descriptor[0] = float(standardDeviation[0]);
    	descriptor[1] = float(max - min);
    }
    
    

Inputs

jakab_input
The color image
jakab_mask
The mask from segmented object.

Output

To be able to enhance SIFT descriptor and still provide good matching results, we need to evaluate the precision of selected metrics. We have chosen to visualize the normal vectors computed from the surface around keypoints.

jakab_output
Normal vector visualisation
Posted on

Fire detection in video

Stefan Linner

The main aim of this example is to automatically detect fire in video, using computer vision methods, implemented in real-time with the aid of the OpenCV library. Proposed solution must be applicable in existing security systems, meaning with the use of regular industrial or personal video cameras. Necessary solution precondition is that camera is static. Given the computer vision and image processing point of view, stated problem corresponds to detection of dynamically changing object, based on his color and moving features.

While static cameras are utilized, background detection method provides effective segmentation of dynamic objects in video sequence. Candidate fire-like regions of segmented foreground objects are determined according to the rule-based color detection.

Input

linner_input

Process outline

linner_process

Process steps

  1. Retrieve current video frame
    capture.retrieve(frame);
    
  2. Update background model and save foreground mask to
    BackgroundSubtractorMOG2 pMOG2;
    Mat fgMaskMOG2,
    pMOG2(frame, fgMaskMOG2);
    
  3. Convert current 8-bit frame in RGB color space to 32-bit floating point YCbCr color space.
    frame.convertTo(temp, CV_32FC3, 1/255.0);
    cvtColor(temp, imageYCrCb, CV_BGR2YCrCb);
    
  4. For every frame pixel, check if it is foreground and if it meets the expected fire color features.
    colorMask = Mat(frame.rows, frame.cols, CV_8UC1);
    for (int i = 0; i < imageYCrCb.rows; i++){
    	const uchar* fgMaskValuePt = fgMaskMOG2.ptr<uchar>(i);
    	uchar* colorMaskValuePt = colorMask.ptr<uchar>(i);
    	for (int j = 0; j < imageYCrCb.cols; j++){ if (fgMaskMOG2[j] > 0 && isFirePixel(i, j))
    			colorMaskValuePt[j] = 255;
    		else
    			colorMaskValuePt[j] = 0;
    	}
    }
    
    …
    
    const int COLOR_DETECTION_THRESHOLD = 40;
    bool isFirePixel(const int row, const int column){
    	…
    		if (valueY > valueCb
    			&& intValueCr > intValueCb
    			&& (valueY > meanY && valueCb < meanCb && valueCr > meanCr)
    			&& ((abs(valueCb - valueCr) * 255) > COLOR_DETECTION_THRESHOLD))
    			return true;
    
  5. Draw bounding rectangle
    vector<Point> firePixels;
    …
    if (colorMaskPt[j] > 0)
    firePixels.push_back(Point(j, i));
    …
    rectangle(frame, boundingRect(firePixels), Scalar(0, 255, 0), 4, 1, 0);
    

Samples

linner_mask
Foreground mask
linner_mask2
Fire region mask
Linner_Fire
Result

References

CELIK, T., DEMIREL, H.: Fire detection in video sequences using a generic color model. In: Fire Safety Journal, 2008, 44.2: 147-158.

Posted on

Eye-Shape Classification

Veronika Štrbáková

The project shows detection and recognition of face and eyes from input image (webcam). I use for detection and classification haarcascade files from OpenCV. If eyes are recognized, I classify them as opened or narrowed. The algorithm uses the OpenCV and SVMLight library.

Functions used: CascadeClassifier::detectMultiScale, HOGDescriptor::compute HOGDescriptor::setSVMDetector, SVMTrainer::writeFeatureVectorToFile

The process:

  1. As first I make positive and negative dataset. Positive dataset are photos of narrowed eyes and negative dataset are photos of opened eyes.
    Strbakova_eye
  2. Then I make HOGDescriptor and I use it to compute feature vector for every picture. These pictures are used to train SVM vector and their feature vectors are saved to one file: features.dat
    HOGDescriptor hog;
    vector<float> featureVector;
    SVMLight::SVMTrainer svm("features.dat");
    hog.compute(img, featureVector, Size(8, 8), Size(0, 0));
    svm.writeFeatureVectorToFile(featureVector, true);
    
  3. From feature vectors I compute single descriptor vector and I set him to my HOGDescriptor.
    SVMLight::SVMClassifier c("classifier.dat");
    vector descriptorVector = c.getDescriptorVector();
    hog.setSVMDetector(descriptorVector);
    
  4. I detect every face and every eye from picture. For every found picture of eye I cut it and I use HOGDescriptor to detect narrowed shape of eye.
    strbakova_face_det
    Face, Eye and Mouth detection
    strbakova_cutting_eyes
    Cutting eyes and conversion to grayscale format

    strbakova_narrowed_eyes.jpg
    Finding narrowed eyes
Posted on

Tongue tracking

Simek Miroslav

This project is focused on tracking tongue using just the information from plain web camera.  Majority of approaches tried in this project failed including edge detection, morphological reconstruction and point tracking because of various reasons like homogenous and position-variable character of tongue.

The approach that yields usable results is Farneback method of optical flow. By using this method we are able to detect the direction of movement in image and tongue specifically when we use it on image of sole mouth. However mouth area found by haar cascade classifier is very shaky so the key part is to stabilize it.

Functions used: calcOpticalFlowFarneback, CascadeClassifier.detectMultiScale

The process:

  1. Detection of face and mouth using haar cascade classifier where mouth is being searched in the middle of the area between nose and bottom of the face.
    faceCascade.detectMultiScale(frame, faces, 1.1, 3, 0, Size(200, 200), Size(1000, 1000));
    mouthCascade.detectMultiScale(faceMouthAreaImage, possibleMouths, 1.1, 3, 0, Size(50, 20), Size(250, 150));
    noseCascade.detectMultiScale(faceNoseAreaImage, possibleNoses, 1.1, 3, 0, Size(20, 30), Size(150, 250));
    
  2. Stabilization of mouth area on which optical flow will be used.
    const int movementDistanceThreshold = 40;
    const double movementSpeed = 0.25;
    
    int xDistance = abs(newMouth.x - mouth.x);
    int yDistance = abs(newMouth.y - mouth.y);
    
    if (xDistance + yDistance > movementDistanceThreshold)
    	moveMouthRect = true;
    
    if (moveMouthRect)
    {
    	mouth.x += (int)((double)(newMouth.x - mouth.x) * movementSpeed);
    	mouth.y += (int)((double)(newMouth.y - mouth.y) * movementSpeed);
    }
    
    if (xDistance + yDistance <= 1.0 / movementSpeed)
    	moveMouthRect = false;
    
  3. Optical flow (Farneback) of the current and previous stabilized frames from camera.
    cvtColor(img1, in1, COLOR_BGR2GRAY);
    cvtColor(img2, in2, COLOR_BGR2GRAY);
    calcOpticalFlowFarneback(in1, in2, opticalFlow, 0.5, 3, 15, 3, 5, 1.2, 0);
    

Limitation:

  • Head movements must be minimal to none to work correctly.
  • Actual position of tongue is unknown. What is being tracked is the direction of tongue’s movement in the moment when the tongue moved.

Samples:

Simek_tongue

Posted on

People detection

Martin Petlus

The goal of this project is detection of people on images. Persons on images are:

  • standing
  • person can be rotated from the front, back and from the side
  • different sizes of persons
  • can be in move
  • several persons on single image

petlus_detector1

For our project the main challenge was the highest possible precision of detection, detection of all persons on image in all possible situations. Persons can also be interleaved.

In our project we have experimented with two different approaches. Both are based on SVM classifier. This classifier takes images as input and detects persons on image. HOG descriptor is used by classifier to extract features from images in classification process. Sliding window is used to detect persons of different sizes. We have experimented with two different classifiers on two different datasets (D1 and D2).

  • Trained classifier from OpenCV
    • Precision:
      • D1: 51.5038%, 3 false positives
      • D2: 56.3511%, 49 false positives
  • Our trained classifier
    • Precision
      • D1: 66.9556%, 87 false positives
      • D2: 40.4521%, 61 false positives

Result of people detection:

patlus_detector2

We see possible improvments in extracting other features from images, or in using bigger datasets.

void App::trainSVM()
{
	CvSVMParams params;
	/*params.svm_type = CvSVM::C_SVC;
	params.kernel_type = CvSVM::LINEAR;
	params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);*/
	params.svm_type = SVM::C_SVC;
	params.C = 0.1;
	params.kernel_type = SVM::LINEAR;
	params.term_crit = TermCriteria(CV_TERMCRIT_ITER, (int)1e7, 1e-6);
	int rows = features.size();
	int cols = number_of_features();
	Mat featuresMat(rows, cols, CV_32FC1);
	Mat labelsMat(rows, 1, CV_32FC1);
	for (unsigned i = 0; i<rows; i++)
	{
		for (unsigned j = 0; j<cols; j++)
		{
			featuresMat.at<float>(i, j) = features.at(i).at(j);
		}
	}
	for (unsigned i = 0; i<rows; i++)
	{
		labelsMat.at<float>(i, 0) = labels.at(i);
	}
	SVM.train(featuresMat, labelsMat, Mat(), Mat(), params);
	SVM.getSupportVector(trainedDetector);
	hog.setSVMDetector(trainedDetector);
}
Posted on

Optical character recognition (OCR)

Robert Cerny

Example down below shows conversion of scanned or photographed images of typewritten text into machine-encoded/computer-readable text. Process was divided into pre-processing, learning and character recognizing. Algorithm is implemented using the OpenCV library and C++.

 The process

  1. Pre-processing – grey-scale, median blur, adaptive threshold, closing
    cvtColor(source_image, gray_image, CV_BGR2GRAY);
    medianBlur(gray_image, blur_image, 3);
    adaptiveThreshold(blur_image, threshold, 255, 1, 1, 11, 2);
    Mat element = getStructuringElement(MORPH_ELLIPSE, Size(3, 3), Point(1, 1));
    morphologyEx(threshold, result, MORPH_CLOSE, element);
    

    cerny_preprocessing
    Before and after pre-processing
  2. Learning – we need image with different written styles of same character for each character we want to recognizing. For each reference picture we use these methods: findContours, detect too small areas and remove them from picture.
    vector < vector<Point> > contours;
    vector<Vec4i> hierarchy;
    findContours(result, contours, hierarchy, CV_RETR_CCOMP,
    	CV_CHAIN_APPROX_SIMPLE);
    for (int i = 0; i < contours.size(); i = hierarchy[i][0]) {
    	Rect r = boundingRect(contours[i]);
    	double area0 = contourArea(contours[i]);
    	if (area0 < 120) {
    		drawContours(thr, contours, i, 0, CV_FILLED, 8, hierarchy);
    		continue;
    	}
    }
    

    Next step is to resize all contours to fixed size 50×50 and save as new png image.

    resize(ROI, ROI, Size(50, 50), CV_INTER_CUBIC);
    imwrite(fullPath, ROI, params);
    

    We get folder for each character with 50×50 images

    cerny_character

  3. Recognizing – now we know what look like A, B, C … For recognition of each character in our picture we use steps from previous state of algorithm. We pre-process our picture find contour and get rid of small areas. Next step is to order contours that we can easily output characters in right order.
    while (rectangles.size() > 0) {
    	vector<Rect> pom;
    	Rect min = rectangles[rectangles.size() - 1];
    	for (int i = rectangles.size() - 1; i >= 0; i--) {
    		if ((rectangles[i].y < (min.y + min.height / 2)) && (rectangles[i].y >(min.y - min.height / 2))) {
    			pom.push_back(rectangles[i]);
    			rectangles.erase(rectangles.begin() + i);
    		}
    	}
    	results.push_back(pom);
    }
    

    cerny_template_matching

    Template matching is method for match two images, where template is each of our 50×50 images from learning state and next image is ordered contour.

    foreach detected in image
    	foreach template in learned
    		if detected == template
    			break
    		end
    	end
    end
    

Results

Recognizing characters with template matching in ordered contours array where templates are our learned images of characters. Contours images have to be resize to 51×51 pixels because our templates are 50×50 pixels.

matchTemplate(ROI, tpl, result, CV_TM_SQDIFF_NORMED);
minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, Mat());
if (maxVal >= 0.9) {	//treshold
	cout << name; // print character
	found = true;
	return true;
}

Currently we support only characters A, B, K. We can see that character K was recognized twice from 4 characters in image.  That’s because out set of K written styles was too small (12 pictures). Recognition of characters A and B was 100 % successful (set has 120 written style pictures).

cerny_original
Original image
Cerny_ocr
Recognized image

Console output: abkbabaaakabaaaabaaapaa

Posted on

Tracking the movement of the lips

Peter Demcak

In this project, we aim to recognize the gestures made by the users by moving their lips; Examples: closed mouth, mouth open, mouth wide open, puckered lips. The challenges in this task are the high homogeneity in the observed area, and the rapidity of lip movements. Our first attempts in detecting said gestures are based on the detection of the lip movements through flow with the Farneback method implemented in OpenCV, or alternatively the calculation of the motion gradient from a silhouette image. It appears, that these methods might not be optimal for the solution of this problem.

OpenCV functions: cvtColor, Sobel, threshold, accumulateWeighted, calcMotionGradient, calcOpticalFlowPyrLK

Process

  1. Detect the position of the largest face in the image using OpenCV cascade classifier. Further steps will be applied using the lower half of the found face.
    faceRects = detect(frame, faceClass);
    
  2. Transform the image map to HLS color space, and obtain the luminosity map of the image
  3. Combine the results of horizontal and vertical Sobel methods to detect edges of the face features.
    Sobel(hlsChannels[1], sobelVertical, CV_32F, 0, 1, 9);
    Sobel(hlsChannels[1], sobelHorizontal, CV_32F, 1, 0, 9);
    cartToPolar(sobelHorizontal, sobelVertical, sobelMagnitude, sobelAngle, false);
    
  4. Add accumulative edge detection frame images on top of each other to obtain the silhouette image. To prevent raised  noise in areas without edges, apply a threshold to the Sobel map.
    threshold(sobelMagnitude, sobelMagnitude, norm(sobelMagnitude, NORM_INF)/6, 255, THRESH_TOZERO);
    accumulateWeighted(sobelMagnitude, motionHistoryImage, intensityLoss);
    
  5. Calculate the flow using the Farneback method implemented in OpenCV using the current and previous frame
    calcOpticalFlowFarneback(prevSobel, sobelMagnitudeCopy, flow, 0.5, 3, 15, 3, 5, 1.2, 0);
    
Posted on

Sky detection using Slic superpixels

Juraj Kostolansky

This project tries to solve the problem of sky detection using the Slic superpixel segmentation algorithm.

Analysis

The first idea was to use Slic superpixel algorithm to segment an input image and merge pairs of adjecent superpixels based on their similarity. We created a simple tool to manually evaluate the hypothesis that a sky can be separated from a photo with one threshold. In this prototype, we compute the similarity between superpixels as an Euclidean distance between their mean colors in the RGB color space. For the most of images from our dataset we found a threshold which can be used for sky segmentation process.

Next, we analyzed colors of images from our dataset. For each image we saved superpixel colors for sky and the rest of the image in three color spaces (RGB, HSV and Lab) and we plotted them. The resulting graphs are shown below (the first row of graphs represents sky colors, the second row represents colors of the rest of an image). As we can see, the biggest difference is in HSV an Lab color spaces. Based on this evaluation, we choosed Lab as a base working color space to compare superpixels.

RGB

kostolansky_rgb

HSV

kostolansky_hsv

Lab
kostolansky_lab

Final algorithm

  1. Generating superpixels using Slic algorithm
  2. Replacing superpixels with their mean color values
  3. Setting threshold:
    T = [ d1 + (d2 - d1) * 0.3 ] * 1.15
    

    where:

    • d1 – average distance between superpixels in the top 10% of an image
    • d2 – average distance between superpixels in an image without â…“ smallest distances

    The values 0,3 and 1,15 was choosed for best (universal) results for our dataset.

  4. Merging adjecent superpixels
  5. Choosing sky – superpixel with the largest number of pixels in the first row
  6. Draw sky border (red)

Sample

kostolansky_a

kostolansky_b

kostolansky_c

kostolansky_d

Posted on

Cars detection

Adrian Kollar

This project started with car detection using Haar Cascade Classifier. Then we focused on eliminating false positive results by using road detection. We tested the solution on a recorded video, which was obtained with a car camera recorder.

Functions used: cvtColor, canny, countNonZero, threshold, minMaxLoc, split, pow, sqrt, detectMultiScale

The Process

  1. Capture road sample every n-th frame, by capturing rectangle positioned statically in the frame (white rectangle in the examples). Road sample shouldn’t contain line markings. We used canny and countNonZero to avoid line marking.

    kollar_samples
    Road samples
  2. Calculate average road color from captured road samples

    kollar_avg_color
    Average road color
  3. Convert image and average road sample to LAB color space.
  4. For each pixel from the input image, calculate:kollar_equation

    where L, A, B are values from the input image and l, a, b are values from average road sample.

  5. Binarize the result by using threshold function.

Example

Kollar_car_detection
Input image, car detected is in red rectangle
kollar_detection
Road detection
Posted on

Bag of visual words in OpenCV

Jan Kundrac

Bag of visual words (BOW) representation was based on Bag of words in text processing. This method requires following for basic user:

  • Image dataset splitted into image groups, or
  • precomputed image dataset and group histogram representation stored in .xml or .yml file (see XML/YAML Persistence chapter in OpenCV documentation)
  • at least one image to compare via BOW

Image dataset is stored in folder (of any name) with subfolders named by group names. In the subfolders there are images for current group stored. BOW should generate and store descriptors and histograms into specified output .xml or .yml file.

BOW works as follows (compare with Figure 1 and 2):

  • compute visual word vocabulary with k-means algorithm (where k is equivalent with count of visual words in vocabulary). Vocabulary is stored into output file. This should take about 30 minutes on 8 CPU cores when k=500 and image count = 150. OpenMP is used to improve performance.
  • compute group histograms (there are 2 methods implemented for this purpose – median and average histogram, only median is used because of better results). This part requires vocabulary computed. Group histogram is normalized histogram, this means sum of all columns within the histogram equals 1.
  • compute histogram for picture on input and compare it with all group histograms to realize which group image belongs to. This was implemented as histogram intersection.

As seen in Figure 2, whole vocabulary and group histogram computation may be skipped if they were already computed.

BOW
Figure 1: BOW tactic
kandrac_bow_flowchart
Figure 2: Flowchart for whole BOW implementation

For usage simplification I have implemented BOWProperties class as singleton, which holds basic information and settings like BOWDescriptorExtractor, BOWTrainer, reading images as grayscaled images or method for obtaining descriptors (SIFT and SURF are currently implemented and ready to use). Example of implementation is here:

BOWProperties* BOWProperties::setFeatureDetector(const string type, int featuresCount)
{
	Ptr<FeatureDetector> featureDetector;
	if (type.compare(SURF_TYPE) == 0)
	{
		if (featuresCount == UNDEFINED) featureDetector = new SurfFeatureDetector();
		else featureDetector = new SurfFeatureDetector(featuresCount);
	}
	...
}

This is how all other properties are set. The only thing that user have to do is simply set properties and run classification.

There is in most cases single DataSet object holding reference to groups and some Group objects that holds references to images in the group in my implementation. Training implementation:

DataSet part :

void DataSet::trainBOW()
{
	BOWProperties* properties = BOWProperties::Instance();
	Mat vocabulary;
	// read vocabulary from file if not exists compute it
	if (!Utils::readMatrix(properties->getMatrixStorage(), vocabulary, "vocabulary"))
	{
		for each (Group group in groups)
			group.trainBOW();
		vocabulary = properties->getBowTrainer()->cluster();
		Utils::saveMatrix(properties->getMatrixStorage(), vocabulary, "vocabulary");
	}
	BOWProperties::Instance()
		->getBOWImageDescriptorExtractor()
		->setVocabulary(vocabulary);
}

Group part (notice OpenMP usage for parallelization):

unsigned Group::trainBOW()
{
	unsigned descriptor_count = 0;
	Ptr<BOWKMeansTrainer> trainer = BOWProperties::Instance()->getBowTrainer();
	
	#pragma omp parallel for shared(trainer, descriptor_count)
	for (int i = 0; i < (int)images.size(); i++){
		Mat descriptors = images[i].getDescriptors();
		#pragma omp critical
		{
			trainer->add(descriptors);
			descriptor_count += descriptors.rows;
		}
	}
	return descriptor_count;
}

This part of code generates and stores vocabulary. The getDescriptors() method returns descriptors for current image via DescriptorExtractor class. Next part shows how the group histograms are computed:

void Group::trainGroupClassifier()
{
	if (!Utils::readMatrix(properties->getMatrixStorage(), groupClasifier, name))
	{
		groupHistograms = getHistograms(groupHistograms);
		medianHistogram = Utils::getMedianHistogram(groupHistograms, groupClasifier);
		Utils::saveMatrix(properties->getMatrixStorage(), medianHistogram, name);
	}
}

Where getMedianHistogram() method generates median histogram from histograms that are representing each image in current group.

Now the vocabulary and histogram classifiers are computed and stored. Last part is comparing new image with the classifiers.

Group DataSet::getImageClass(Image image)
{
	for (int i = 0; i < groups.size(); i++)
	{
		currentFit = Utils::getHistogramIntersection(groups[i].getGroupClasifier(), image.getHistogram());
		if (currentFit > bestFit){
			bestFit = currentFit;
			bestFitPos = i;
		}
	}
	return groups[bestFitPos];
}

The returned group is group where image most possibly belongs. Nearly every piece of code is little bit simplified but shows basic thoughts. For more detailed code, see sources.


(For complete code see this GitHub repositoryhttps://github.com/VizGhar/BOW/tree/develop)

[1] http://docs.opencv.org/

[2] http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/

[3] http://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/

Posted on

Structure from Motion

Jan Handzus

Main objective of this project was to reconstruct the 3D scene from set of images or recorded video. First step is to find relevant matches between two related images and use this matches to calculate rotation and translation of camera for each input image or frame. In final stage the depth value is extracted with triangulation algorithm.

INPUT

handzus_input

THE PROCESS

  1. Find features in two related images:
    SurfFeatureDetector detector(400);
    detector.detect(actImg, keypoints1);
    detector.detect(prevImg, keypoints2);
    
  2. Create descriptors for features:
    SurfDescriptorExtractor extractor(48, 18, true);
    extractor.compute(actImg, keypoints1, descriptors1);
    extractor.compute(prevImg, keypoints2, descriptors2);
    
  3. Pair descriptors between two images and find relevant matches:
    BFMatcher matcher(NORM_L2);
    matcher.match(descriptors1, descriptors2, featMatches);
    
  4. After we have removed the irrelevant key-points we need to extract the fundamental matrix:
    vector<Point2f> pts1,pts2;
    keyPointsToPoints(keypoints1, pts1);
    keyPointsToPoints(keypoints2, pts2);
    Fundamental = findFundamentalMat(pts1, pts2, FM_RANSAC, 0.5, 0.99, status);
    
  5. Calculate the essential matrix:
    Essential = (K.t() * Fundamental * K);
    

    K…. the camera calibration matrix.

  6. First camera matrix is on starting position therefore we must calculate second camera matrix P1:
    SVD svd(Essential, SVD::MODIFY_A);
    Mat svd_u = svd.u;
    Mat svd_vt = svd.vt;
    Mat svd_w = svd.w;
    Matx33d W(0, -1, 0,
    	1, 0, 0,
    	0, 0, 1);
    //Rotation
    Mat_<double> R = svd_u * Mat(W) * svd_vt;
    //Translation
    Mat_<double> t = svd_u.col(2);
    
  7. Find depth value for each matching point:
    //Make A matrix.
    Matx43d A(u.x*P(2, 0) - P(0, 0), u.x*P(2, 1) - P(0, 1), u.x*P(2, 2) - P(0, 2),
    	u.y*P(2, 0) - P(1, 0), u.y*P(2, 1) - P(1, 1), u.y*P(2, 2) - P(1, 2),
    	u1.x*P1(2, 0) - P1(0, 0), u1.x*P1(2, 1) - P1(0, 1), u1.x*P1(2, 2) - P1(0, 2),
    	u1.y*P1(2, 0) - P1(1, 0), u1.y*P1(2, 1) - P1(1, 1), u1.y*P1(2, 2) - P1(1, 2)
    	);
    //Make B vector.
    Matx41d B(-(u.x*P(2, 3) - P(0, 3)),
    	-(u.y*P(2, 3) - P(1, 3)),
    	-(u1.x*P1(2, 3) - P1(0, 3)),
    	-(u1.y*P1(2, 3) - P1(1, 3)));
    //Solve X.
    Mat_<double> X;
    solve(A, B, X, DECOMP_SVD);
    return X;
    SVD svd(Essential, SVD::MODIFY_A);
    Mat svd_u = svd.u;
    Mat svd_vt = svd.vt;
    Mat svd_w = svd.w;
    Matx33d W(0, -1, 0,
    	1, 0, 0,
    	0, 0, 1);
    //Rotation
    Mat_<double> R = svd_u * Mat(W) * svd_vt;
    //Translation
    Mat_<double> t = svd_u.col(2
    

SAMPLE

handzus_sample

CONCLUSION

We have successfully extracted the depth value for each relevant matching point. But we were not able to visualise the result because of the PCL and other external libraries. In future we try to use Matlab to validate our result.


SOURCES

http://packtlib.packtpub.com/library/9781849517829/ch04
http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/