Posted on

Bottom-up saliency model generation using superpixels

Bottom-up saliency model generation using superpixels

Patrik Polatsek, Wanda Benesova
Slovenska Technicka Univ. (Slovakia)

Abstract. Prediction of human visual attention is more and more frequently applicable in computer graphics, image processing, humancomputer interaction and computer vision. Human attention is influenced by various bottom-up stimuli such as colour, intensity and orientation as well as top-down stimuli related to our memory. Saliency models implement bottom-up factors of visual attention and represent the conspicuousness of a given environment using a saliency map. In general, visual attention processing consists of identification of individual features and their subsequent combination to perceive whole objects. Standard hierarchical saliency methods do not respect the shape of objects and model the saliency as the pixel-by-pixel difference between the centre and its surround.
The aim of our work is to improve the saliency prediction using a superpixel-based approach whose regions should correspond to objects borders. In this paper we propose a novel saliency method that combines a hierarchical processing of visual features and a superpixel-based segmentation. The proposed method is compared with existing saliency models and evaluated on a publicly available dataset.


Paper will be available in 2015: 
P. Polatsek and W. Benesova, “Bottom-up saliency model generation using superpixels,” in Proceedings of the Spring Conference on Computer Graphics 2015.

Posted on

Accelerated gSLIC for Superpixel Generation used in Object Segmentation

Accelerated gSLIC for Superpixel Generation used in Object Segmentation

Robert Birkus

Abstract. The goal of our work is to create a robust object segmentation method which is based on superpixels and will be able to run in real-time applications.

The SLIC algorithm proposed by Achanta et al. [1] is a superpixel segmentation algorithm based on k-means clustering, which efficiently generates superpixels. It seems to be a good trade-off between the time consumption and robustness. Important advancement towards the real time applications using superpixels has been proposed by the authors of the gSLIC – a modified SLIC implementation on the GPU (Graphics Processing Unit) [2].

In this paper, we present a significant acceleration of this superpixel segmentation algorithm gSLIC implemented for the GPU. A different strategy of the implementation on the GPU speeds up the calculation time twice and more over the presented GPU implementation. This implementation can work in real-time even for high resolution images. We also present our method for merging of similar superpixels. This method uses an adaptive decision procedure for merging of superpixels. Accelerated gSLIC is the first part of this proposed object segmentation method.

References

[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. Slic superpixels. Technical report, Ecole Polytechnique Fedralede Lausanne , Report No. EPFL-REPORT-149300, 2010.
[2] C. Y. Ren and I. Reid. gSLIC: a real-time implementation of SLIC superpixel segmentation. Technical report, University of Oxford, Department of Engineering, Technical Report (2011)., 2011.


Paper is avaible at CESCG proceedings:
http://www.cescg.org/CESCG-2015/papers/Birkus-Accelerated_gSLIC_for_Superpixel_Generation_used_in_Object_Segmentation.pdf

Source code:

Solution (Visual Studio 2012, V11):
https://bitbucket.org/Birky/accelerated-gslic-for-superpixel-generation/src

Posted on

3D local descriptors used in methods of visual 3D object recognition

3D local descriptors used in methods of visual 3D object recognition

Marek Jakab, Wanda Benesova
Slovenska Technicka Univ. (Slovakia)

Abstract. In this paper, we propose an enhanced method of 3D object description and recognition based on local descriptors using RGB image and depth information (D) acquired by Kinect sensor. Our main contribution is focused on an extension of the SIFT feature vector by the 3D information derived from the depth map (SIFT-D). We also propose a novel local depth descriptor (DD) that includes a 3D description of the key point neighborhood. Thus defined the 3D descriptor can then enter the decision-making process. Two different approaches have been proposed, tested and evaluated in this paper. First approach deals with the object recognition system using the original SIFT descriptor in combination with our novel proposed 3D descriptor, where the proposed 3D descriptor is responsible for the pre-selection of the objects. Second approach demonstrates the object recognition using an extension of the SIFT feature vector by the local depth description. In this paper, we present the results of two experiments for the evaluation of the proposed depth descriptors. The results show an improvement in accuracy of the recognition system that includes the 3D local description compared with the same system without the 3D local description. Our experimental system of object recognition is working near real-time.

Keywords: local descriptor, depth descriptor, SIFT, segmentation, Kinect v2, 3D object recognition


Paper is available at SPIE proceedings:
http://spie.org/EI/conferencedetails/intelligent-robots-computer-vision
Paper 9406-21

Source code:

Solution (VS 2013):
https://www.dropbox.com/s/a9o1tques7ven9d/Object_Detection_Solution.zip?dl=0

Sources Only:
https://www.dropbox.com/s/qyn73e68s3dq1py/Object_Detection_SourcesOnly.zip?dl=0

Posted on

Pedestrian detection

This project focuses on preprocessing of training images for pedestrian detection. The goal is to train a model of a pedestrian detection. Histogram of oriented gradients HOG has been used as descriptor of image features. Support vector machine SVM has been used to train the model.

Example:

valko1
Input image

There are several ways to cut train example from source:

  1. using simple bounding rectangle
  2. adding “padding” around simple bounding rectangle
  3. preserve given aspect ratio
  4. using only upper half of pedestrian body

At first, the simple bounding rectangle around pedestrian has been determined. Annotation of training dataset can be used if it is available. In this case a segmentation annotation in form of image mask has been used. Bounding box has been created from image mask using contours (if multiple contours for pedestrian has been found, they were merged to one).);

findContours(mask, contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);
boundRect[k] = boundingRect(Mat(contours[k]));

HOG descriptors has been computed using OpenCV function hog.compute(), where descriptor parameters has been set like this:

Size win_size = Size(64, 128); 
HOGDescriptor hog = HOGDescriptor(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9);

Window width = 64 px, window height = 128 px, block size = 16×16 px, block stride = 8×8 px, cell size = 8×8 px and number of orientation bins = 9.

Each input image has been re-scaled to fit the window size.

1.) Image, which had been cut using simple bounding rectangle, has been resized to fit aspect ratio of descriptor window.

valko2 valko3

2.) In the next approach the simple bounding rectangle has been enlarged to enrich descriptor vector with background information. Padding of fixed size has been added to each side of image. When the new rectangle exceeded the borders of source image, the source image has been enlarged by replication of marginal rows and columns.
valko4

if (params.add_padding)
// Apply padding around patches, handle borders of image by replication
{
	l -= horizontal_padding_size;
	if (l < 0)
	{
		int addition_size = -l;
		copyMakeBorder(timg, timg, 0, 0, addition_size, 0, BORDER_REPLICATE);
		l = 0;
		r += addition_size;
	}
	t -= vertical_padding_size;
	if (t < 0)
	{
		int addition_size = -t;
		copyMakeBorder(timg, timg, addition_size, 0, 0, 0, BORDER_REPLICATE);
		t = 0;
		b += addition_size;
	}
	r += horizontal_padding_size;
	if (r > = timg.size().width)
	{
		int addition_size = r - timg.size().width + 1;
		copyMakeBorder(timg, timg, 0, 0, 0, addition_size, BORDER_REPLICATE);
	}
	b += vertical_padding_size;
	if (b > = timg.size().height)
	{
		int addition_size = b - timg.size().height + 1;
		copyMakeBorder(timg, timg, 0, addition_size, 0, 0, BORDER_REPLICATE);
	}
	allBoundBoxesPadding[i] = Rect(Point(l, t), Point(r, b));
}

3. In the next approach the aspect ratio of descriptor window has been preserved while creating the cutting bounding rectangle (so pedestrian were not deformed). In this case the only necessary padding has been added.

4. In the last approach only the half of pedestrian body has been used.

valko5

int hb = t + ((b - t) / 2);
allBoundBoxes[i] = Rect(Point(l, t), Point(r, hb));

valko6

NG rng(12345);
static const int MAX_TRIES = 10;
int examples = 0;
int tries = 0;
int rightBoundary = img.size().width - params.neg_example_width / 2;
int leftBoundary = params.neg_example_width / 2;
int topBoundary = params.neg_example_height / 2;
int bottomBoundary = img.size().height - params.neg_example_height / 2;
while (examples < params.negatives_per_image && tries < MAX_TRIES)
{
	int x = rng.uniform(leftBoundary, rightBoundary);
	int y = rng.uniform(topBoundary, bottomBoundary);
	bool inBoundingBoxes = false;
	for (std::vector::iterator it = allBoundBoxes.begin();
		it != allBoundBoxes.end();
		it++)
	{
		if (it->contains(Point(x, y)))
		{
			inBoundingBoxes = true;
			break;
		}
	}
	if (inBoundingBoxes == false) {
		Rect rct = Rect(Point((x - params.neg_example_width / 2), (y - params.neg_example_height / 2)), Point((x + params.neg_example_width / 2), (y + params.neg_example_height / 2)));
		boost::filesystem::path file_neg = (params.negatives_target_dir_path / img_path.stem()).string() + "_" + std::to_string(examples) + img_path.extension().string();
		imwrite(file_neg.string(), img(rct));
		examples++;
	}
	tries++;
}

SVM model has been learned using Matlab function fitcsvm(). The single descriptor vector has been computed:

ay = SVMmodel.Alpha .* SVMmodel.SupportVectorLabels;
sv = transpose(SVMmodel.SupportVectors);
single = sv*ay;
% Append bias
single = vertcat(single, SVMmodel.Bias);
% Save vector to file
dlmwrite(model_file, single,'delimiter','\n');

Single descriptor vector has been loaded and set
( hog.setSVMDetector(descriptor_vector) ) in detection algorithm which used the OpenCV function hog.detectMultiScale() to detect occurrences on multiple scale within whole image.

HOG visualization
As a part of project, the HOG descriptor visualization has been implemented. See algorithm bellow. Orientations and magnitudes of gradients are visualized by lines at each position (cell). In first part of algorithm all values from normalization over neighbor blocks at given position has been merged together. Descriptor vector of size 9×4 for one position has yielded vector of size 9.

valko7

void visualize_HOG(std::string file_path, cv::Size win_size = cv::Size(64, 128), int
	visualization_scale = 4)
{
	using namespace cv;
	Mat img = imread(file_path, CV_LOAD_IMAGE_GRAYSCALE);
	// resize image (size must be multiple of block size)
	resize(img, img, win_size);
	HOGDescriptor hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9);
	vector descriptors;
	hog.compute(img, descriptors, Size(0, 0), Size(0, 0));
	size_t cell_cols = hog.winSize.width / hog.cellSize.width;
	size_t cell_rows = hog.winSize.height / hog.cellSize.height;
	size_t bins = hog.nbins;
	// block has size: 2*2 cell
	size_t block_rows = cell_rows - 1;
	size_t block_cols = cell_cols - 1;
	size_t block_cell_cols = hog.blockSize.width / hog.cellSize.width;
	size_t block_cell_rows = hog.blockSize.height / hog.cellSize.height;
	size_t binspercellcol = block_cell_rows * bins;
	size_t binsperblock = block_cell_cols * binspercellcol;
	size_t binsperblockcol = block_rows * binsperblock;
	struct DescriptorSum
	{
		vector bin_values;
		int components = 0;
		DescriptorSum(int bins)
		{
			bin_values = vector(bins, 0.0f);
		}
	};
	vector average_descriptors = vector(cell_cols, vector(cell_rows, DescriptorSum(bins)));
	// iterate over block columns
	for (size_t col = 0; col < block_cols; col++)
	{
		// iterate over block rows
		for (size_t row = 0; row < block_rows; row++)
		{
			// iterate over cell columns of block
			for (size_t cell_col = 0; cell_col < block_cell_cols; cell_col++)
			{
				// iterate over cell rows of block
				for (size_t cell_row = 0; cell_row < block_cell_rows; cell_row++)
				{
					// iterate over bins of cell
					for (size_t bin = 0; bin < bins; bin++)
					{
						average_descriptors[col + cell_col][row + cell_row].bin_values[bin] += descriptors[(col*binsperblockcol) + (row*binsperblock) + (cell_col*binspercellcol) + (cell_row*bins) + (bin)];
					}
					average_descriptors[col + cell_col][row + cell_row].components++;
				}
			}
		}
	}
	resize(img, img, Size(hog.winSize.width * visualization_scale, hog.winSize.height * visualization_scale));
	cvtColor(img, img, CV_GRAY2RGB);
	Scalar drawing_color(0, 0, 255);
	float line_scale = 2.f;
	int cell_half_width = hog.cellSize.width / 2;
	int cell_half_height = hog.cellSize.height / 2;
	double rad_per_bin = M_PI / bins;
	double rad_per_halfbin = rad_per_bin / 2;
	int max_line_length = hog.cellSize.width;
	// iterate over columns
	for (size_t col = 0; col < cell_cols; col++)
	{
		// iterate over cells in column
		for (size_t row = 0; row < cell_rows; row++)
		{
			// iterate over orientation bins
			for (size_t bin = 0; bin < bins; bin++)
			{
				float actual_bin_strength = average_descriptors[col][row].bin_values[bin] / average_descriptors[col][row].components;
				// draw lines
				if (actual_bin_strength == 0)
					continue;
				int length = static_cast(actual_bin_strength * max_line_length * visualization_scale * line_scale);
				double angle = bin * rad_per_bin + rad_per_halfbin + (M_PI / 2.f);
				double yrange = sin(angle) * length;
				double xrange = cos(angle) * length;
				Point cell_center;
				cell_center.x = (col * hog.cellSize.width + cell_half_width) * visualization_scale;
				cell_center.y = (row * hog.cellSize.height + cell_half_height) * visualization_scale;
				Point start;
				start.x = cell_center.x + static_cast(xrange / 2);
				start.y = cell_center.y + static_cast(yrange / 2);
				Point end;
				end.x = cell_center.x - static_cast(xrange / 2);
				end.y = cell_center.y - static_cast(yrange / 2);
				line(img, start, end, drawing_color);
			}
		}
	}
	char* window = "HOG visualization";
	cv::namedWindow(window, CV_WINDOW_AUTOSIZE);
	cv::imshow(window, img);
	while (true)
	{
		int c;
		c = waitKey(20);
		if ((char)c == 32)
		{
			break;
		}
	}
}
Posted on

KEGA 068UK-4/2011

Cultural and Educational Grant Agency MÅ VVaÅ  SR (KEGA) : Integration of visual information studies and creation of comprehensive multi-medial study materials.

Visual detection and object recognition are the main challenges of computer vision. Detection and recognition systems can be used in surveillance, medicine, robotics or augmented reality applications. Image pre-processing is used for removing unwanted image effects like signal noise; edge detection, blur removal, etc. in order to get images suitable for further processing.

Next step in the pipeline is the extraction of relevant and discriminative features for classification. The role of the classification is to assign the object to the correct class of objects. The last part of the book deals with color theory describing colorimetric functions and color spaces for computer vision. Lastly, visual perception theory, including eye tracking and visual saliency detection methods is described.

The book is intended not only for students, but also for the broad technical audience.

http://vgg.fiit.stuba.sk/kniha/

Posted on

Detection of objects in soccer

Lukas Sekerak

Project idea

Try detect objects (players, soccer ball, referees, goal keeper) in soccer match. Detect their position, movement and show picked object in ROI area. More info in a presentation and description document.

Requirements

  • Opencv 2.4
  • log4cpp

Dataset videos

Operation Agreement CNR-FIGC

T. D’Orazio, M.Leo, N. Mosca, P.Spagnolo, P.L.Mazzeo A Semi-Automatic System for Ground Truth Generation of Soccer Video Sequences in the Proceeding of the 6th IEEE International Conference on Advanced Video and Signal Surveillance, Genoa, Italy September 2-4 2009

Setup

  1. Clone this repository into workspace
  2. Download external requirements + dataset
  3. Build project
  4. Run project

Control keys

  • W – turn on/off ROI area
  • Q,E – switch between detected ROI
  • S – pause of processing frames
  • F – turn on/off debug draw

License

This software is released under the MIT License.

Credits

  • Ing. Wanda BeneÅ¡ová, PhD. – Supervisor

2


Project repository: https://github.com/sekys/sk.seky.soccerball

Posted on

Stereo reconstruction

Ondrej Galbavy

This example presents straightforward process to determine depth of points (sparse depth map) from stereo image pair using stereo reconstruction. Example is implemented in Python 2.

Stereo calibration process

We need to obtain multiple stereo pairs with chessboard shown on both images.

galbavy_chessboard
Chessboard
galbavy_chessboard_pattern
Detected chessboard pattern
  1. For each stereo pair we need to do:
    1. Find chessboard: cv2.findChessboardCorners
    2. Find subpixel coordinates: cv2.cornerSubPix
    3. If both chessboards are found, store keypoints
    4. Optionally draw chessboard pattern on image
  2. Compute calibraton: cv2.stereoCalibrate
    1. We get:
      1. Camera matrices and distortion coefficients
      2. Rotation matrix
      3. Translation vector
      4. Essential and fundamental matrices
  3. Store calibration data for used camera setup
        for paths in calib_files:
            
            left_img = cv2.imread(paths.left_path, cv2.CV_8UC1)
            right_img = cv2.imread(paths.right_path, cv2.CV_8UC1)
    
            image_size = left_img.shape
    
            find_chessboard_flags = cv2.CALIB_CB_ADAPTIVE_THRESH | cv2.CALIB_CB_NORMALIZE_IMAGE | cv2.CALIB_CB_FAST_CHECK
    
            left_found, left_corners = cv2.findChessboardCorners(left_img, pattern_size, flags = find_chessboard_flags)
            right_found, right_corners = cv2.findChessboardCorners(right_img, pattern_size, flags = find_chessboard_flags)
    
            if left_found:
                cv2.cornerSubPix(left_img, left_corners, (11,11), (-1,-1), (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.1))
            if right_found:
                cv2.cornerSubPix(right_img, right_corners, (11,11), (-1,-1), (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.1))
    
            if left_found and right_found:
                img_left_points.append(left_corners)
                img_right_points.append(right_corners)
                obj_points.append(pattern_points)
    
            cv2.imshow("left", left_img)
            cv2.drawChessboardCorners(left_img, pattern_size, left_corners, left_found)
            cv2.drawChessboardCorners(right_img, pattern_size, right_corners, right_found)
    
            cv2.imshow("left chess", left_img)
            cv2.imshow("right chess", right_img)
    
        stereocalib_criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS, 100, 1e-5)
        stereocalib_flags = cv2.CALIB_FIX_ASPECT_RATIO | cv2.CALIB_ZERO_TANGENT_DIST | cv2.CALIB_SAME_FOCAL_LENGTH | cv2.CALIB_RATIONAL_MODEL | cv2.CALIB_FIX_K3 | cv2.CALIB_FIX_K4 | cv2.CALIB_FIX_K5
        stereocalib_retval, cameraMatrix1, distCoeffs1, cameraMatrix2, distCoeffs2, R, T, E, F =        cv2.stereoCalibrate(obj_points,img_left_points,img_right_points,image_size,criteria = stereocalib_criteria, flags = stereocalib_flags)
    

Stereo rectification process

We need to:

  1. Compute rectification matrices: cv2.stereoRectify
  2. Prepare undistortion maps for both cameras: cv2.initUndistortRectifyMap
  3. Remap each image: cv2.remap
rectify_scale = 0 # 0=full crop, 1=no crop
R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(data["cameraMatrix1"], data["distCoeffs1"], data["cameraMatrix2"], data["distCoeffs2"], (640, 480), data["R"], data["T"], alpha = rectify_scale)
left_maps = cv2.initUndistortRectifyMap(data["cameraMatrix1"], data["distCoeffs1"], R1, P1, (640, 480), cv2.CV_16SC2)
right_maps = cv2.initUndistortRectifyMap(data["cameraMatrix2"], data["distCoeffs2"], R2, P2, (640, 480), cv2.CV_16SC2)

for pair in pairs:
    left_img_remap = cv2.remap(pair.left_img, left_maps[0], left_maps[1], cv2.INTER_LANCZOS4)
    right_img_remap = cv2.remap(pair.right_img, right_maps[0], right_maps[1], cv2.INTER_LANCZOS4)
galbavy_chessboard_raw
Raw images
galbavy_chessboard_rectified
Rectified images with no crop

Stereo pairing

Standard object matching: keypoint detection (Harris, SIFT, SURF,…), descriptor extractor (SIFT, SURF) and matching (Flann, brute force,…).  Matches are filtered for same line coordinates to remove mismatches.

detector = cv2.FeatureDetector_create("HARRIS")
extractor = cv2.DescriptorExtractor_create("SIFT")
matcher = cv2.DescriptorMatcher_create("BruteForce")

for pair in pairs:
    left_kp = detector.detect(pair.left_img_remap)
    right_kp = detector.detect(pair.right_img_remap)
    l_kp, l_d = extractor.compute(left_img_remap, left_kp)
    r_kp, r_d = extractor.compute(right_img_remap, right_kp)
    matches = matcher.match(l_d, r_d)
    sel_matches = [m for m in matches if abs(l_kp[m.queryIdx].pt[1] - r_kp[m.trainIdx].pt[1]) &lt; 3]
galbavy_keypoint_matches
Raw keypoint matches on cropped rectified images
galbavy_same_matches
Same line matches

Triangulation

How do we get depth of point? Dispartity is difference of x coordinate of the same keypoint in both images. Closer points have greater dispartity and far points have almost zero dispartity. Depth can be defined as:

galbavy_equation

Where:

  • f – focal length
  • T – baseline – distance of cameras
  • x1, x2 – x coordinated of same keypoint
  • Z – depth of point
for m in sel_matches:
        left_pt = l_kp[m.queryIdx].pt
        right_pt = r_kp[m.trainIdx].pt
        dispartity = abs(left_pt[0] - right_pt[0])
        z = triangulation_constant / dispartity
galbavy_disparity
Dispartity illustration

Result

Stereo
Resulting depth in centimeters of keypoints
Posted on

Motion Analysis & Object Tracking

Pavol Zbell

Introduction

In our work we focus on basics of motion analysis and object tracking. We compare MeanShift (non-parametric, finds an object on a back projection image) versus CamShift (continuously adaptive mean shift, finds an object center, size, and orientation) algorithms and effectively utilize them to perform simple object tracking. In case these algorithms fail to track the desired object or the object travels out of window scope, we try to find another object to track. To achieve this, we use a background subtractor based on a Gaussian Mixture Background / Foreground Segmentation Algorithm  to identify the next possible object to track. There are  two suitable implementations of this algorithm in OpenCV – BackgroundSubtractorMOG and BackgroundSubtractorMOG2. We also compare performance of both these implementations.

Used functions: calcBackProject, calcHist, CamShift, cvtColor, inRange, meanShift, moments, normalize

Solution

  1. Initialize tracking window:
    • Set tracking window near frame center
  2. Track object utilizing MeanShift / CamShift
    • Calculate HSV histogram of region of interest (ROI) and track
    int dims = 1;
    int channels[] = {0};
    int hist_size[] = {180};
    float hranges[] = {0, 180};
    const float *ranges[] = {hranges};
    roi = frame(track_window);
    cvtColor(roi, roi_hsv, cv::COLOR_BGR2HSV);
    // clamp > H: 0 - 180, S: 60 - 255, V: 32 - 255
    inRange(roi_hsv, Scalar(0.0, 60.0, 32.0), Scalar(180.0, 255.0, 255.0), mask);
    calcHist (&roi_hsv, 1, channels, mask, roi_hist, dims, hist_size, ranges);
    normalize(roi_hist, roi_hist, 0, 255, NORM_MINMAX);
    ...
    Mat hsv, dst;
    cvtColor(frame, hsv, cv::COLOR_BGR2HSV);
    calcBackProject(&hsv, 1, channels, roi_hist, dst, ranges, 1);
    clamp_rect(track_window, bounds);
    print_rect("track-window", track_window);
    Mat result = frame.clone();
    if (use_camshift) {
    	RotatedRect rect = CamShift(dst, track_window, term_criteria);
    	draw_rotated_rect(result, rect, Scalar(0, 0, 255));
    }
    else {
    	meanShift(dst, track_window, term_criteria);
    	draw_rect(result, track_window, Scalar(0, 0, 255));
    }
    
  3. Lost tracked object?
    • In other words, is the centeroid of MOG mask out of tracking window?
    bool contains;
    if (use_camshift) {
    	contains = rect.boundingRect().contains(center);
    }
    else {
    	contains = center.inside(track_window);
    }
    
  4. When lost, reinitialize tracking window:
    • Set tracking window to centeroid of MOG mask
    • Go back to 2. and repeat
    mog->operator()(frame, mask);
    center = compute_centroid(mask);
    track_window = RotatedRect(Point2f(center.x, center.y), Size2f(100, 50), 0).boundingRect();
    

Samples

As seen on Fig. 1, MeanShift (left) operates with fixed size tracking windows which can not be rotated. On the contrary, CamShift (right) utilizes the full potential of dynamic size rotated rectangles. Working with CamShift yielded significantly better tracking results in general. On the other hand we recommend to use MeanShift when the object is in constant distance from the camera and moves without rotation (or is represented by a circle), in such case MeanShift performs faster than CamShift and produces sufficient results without any rotation or size change noise.

zbell_meanshift_camshift
Fig. 1: MeanShift vs. CamShift.

Comparison of BackgroundSubtractorMOG and BackgroundSubtractorMOG2 is depicted on Fig. 2. MOG approach is simpler than MOG2 as it considers only binary masks whereas MOG2 operates on a full gray scale masks. Experiments shown that in our specific case MOG performed better as it yielded less information noise than MOG2. MOG2 will probably produce better results than MOG when utilized more effectively than in out initial approach (simple centeroid from mask extraction).

zbell_mog
Fig. 2: MOG vs. MOG2.

Summary

In this project explored the possibilities of simple object tracking via OpenCV APIs utilizing various algorithms such as MeanShift and CamShift, Background Extractor MOG and MOG2, which we also compared. Our solution performs relatively well, but we can certainly improve it by fine tuning histogram calculation, MOG, and other parameters. Other improvements can be done in MOG usage, as now the objects are only recognized by finding MOG mask centeroids. This also calls to better tracking window initialization process.

Posted on

Signature recognition

Matej Stetiar

Main purpose of this project was to recognise signatures. For this purpose we used descriptor from the bottom of the signature. Then we used Mahalanobis distance to identify signatures.

Image preprocessing

We have worked with 2 sets of signatures and each of them had about 200 pictures of signatures. Examples of those signatures are below.

stetiar_input
Input signatures

The signatures had different quality. So we decided to find skeleton of them.

Mat skel(tmp_image.size(), CV_8UC1, Scalar(0));
Mat tmp(tmp_image.size(), CV_8UC1);

Mat structElem = getStructuringElement(MORPH_CROSS, Size(3, 3));

do
{
	morphologyEx(tmp_image, tmp, MORPH_OPEN, structElem);
	bitwise_not(tmp, tmp);
	bitwise_and(tmp_image, tmp, tmp);
	bitwise_or(skel, tmp, skel);
	erode(tmp_image, tmp_image, structElem);

	double max;
	minMaxLoc(tmp_image, 0, &max);
	done = (max == 0);
} while (!done);
stetiar_sketelon
Skeleton of the signature

Then we decided to find contours and filter them according to their size to remove the noise.

vector<vector<Point>> contours;
vector<Vec4i> hierarchy;

findContours(image, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);

Mat drawing = Mat::zeros(image.size(), CV_8UC1);
for (int i = 0; i< contours.size(); i++){
	if (contours[i].size() < 10.0) continue;
	Scalar color = Scalar(255);
	drawContours(drawing, contours, i, color, CV_FILLED, 8, hierarchy);
}

The result of image preprocessed like this can be seen below. This picture has very thick lines so we decided to add contour image and skeleton image together.

stetiar_contour
Contour image

For purpose of adding 2 images together we used logical function AND.

bitwise_and(newImage, skeleton, contours);

The result of this process was signature with thin line with no noise.

Creating descriptors

To create descriptors we used bottom line of the signature. To lower the factor of length of signature we always divided signature to 25 similar pieces. Space between these pieces was calculated dynamically. The descriptor was gathered as maximum of white point position in each of 25 division points.

To reduce the factor that signature is written in some angle we transformed the points to lower positions. To do so we gathered points in 10° range from lowest point. Then we calculated average of these points. Then we used linear regression to add coefficient to all points. Linear regression was made using maximal point and average of the other points.

Descriptor we created had counted with different lengths of signatures and different angles of signatures. So the last step was to normalise the height. To do so we subtracted minimum point of the signature from all points and then we divided all points with maximum from descriptor.

Learning phase

We created 2 sets of descriptors each with 180 examples. From these descriptors we created 2 objects of class Signature.

class Signature
{
	std::string name; //signature name
	cv::Mat centorid; //centorid created from learning set
	cv::Mat covarMat; //covariance matrix created form lenrning set
};

To recognition of the signatures we wanted to use Mahalanobis distance method. To do so we needed centroid for our data set and inverse covariance matrix. We calculated those using functions:

cv::calcCovarMatrix(samples, this->covarMat, this->centorid, CV_COVAR_NORMAL | CV_COVAR_ROWS);
cv::invert(this->covarMat, this->covarMat, cv::DECOMP_SVD);

In code above the variable samples is representing the matrix of all samples.

Testing

After we created inverse covariance matrix and centroid we could start testing. Testing of the signature was creating its descriptor using same steps as when creating descriptors in testing set. Then we could call function to calculate Mahalanobis distance.

Mahalanobis(testSample, this->mean, this->covarMat);

Using this algorithm we were able to identify some of the signatures. But the algorithm is very sensitive to changes in image quality and number of items in training set.

Posted on

Tracking moving object

This example shows how to separate and track moving object using OpenCV. First, the background of the video is being calculated and moving objects detected, then it is filtered and tracked.

Used: cv::BackgroundSubtractorMOG2; cv::getStructuringElement; cv::morphologyEx; cv::BackgroundSubtractorMOG2.operator();

The process

  1. Initialize the background extraction object
    BackgroundSubtractorMOG2 bg( 500, 64, false);
    
  2. Process video frame by background extraction object by it’s method operator and receive mask of moving object
    bg.operator()( origi, mask);
    

    dzurilla_operator

  3. Process mask by morphologyEx’s open to remove noise in mask
    morphologyEx( mask, mask, MORPH_OPEN, element1 );
    
  4. Process mask by morphologyEx’s close to close gaps in mask
    morphologyEx( mask, mask, MORPH_CLOSE, element2 );
    

    dzurilla_mask

  5. Apply mask on video frame
    origi.copyTo( proci0, mask);
    
  6. Find good features to track and apply KLTracker
    goodFeaturesToTrack( proci0, points[0], MAX_COUNT, 0.1, 10, Mat(), 3, 0, 0.04);
    calcOpticalFlowPyrLK( proci1, proci0, points[0], points[1], status, err, winSize, 3, termcrit, 0, 0.00001);
    

    dzurilla_KLTracker

  7. Combine with initial frame
    size_t i, k;
    for (i = k = 0; i &lt; points[1].size(); i++)
    {
    	if (!status[i]) continue;
    	points[1][k++] = points[1][i];
    	circle(finI, points[1][i], 3, Scalar(0, 255, 0), -1, 8);
    	circle(procI2, points[1][i], 3, Scalar(0, 255, 0), -1, 8);
    }
    points[1].resize(k);
    

    dzurilla_combination

  8. Find contours of the mask, find it’s bounding rectangle and draw it onto output frame
    findContours(procI3, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
    if (contours.size() &gt; 0)
    for (int i = 0; i &lt; (int)contours.size(); i++)
    	rectangle(finI, boundingRect(contours[i]), Scalar(0, 255, 0));
    

    dzurilla_output

Bounding rectangle hints position of moving object on the scene and could be used to approximate it’s coordinates

Posted on

Object removing in image/video

Marek Grznar

Introduction

In our project we focus on simple object recognition, then tracking this recognized object and finally we try to delete this object from video. By object recognition we used local features-based methods. We compare SIFT and SURF methods for detection and description. By RANSAC algorithm we compute the homography. In case these algorithms successfully find the object we create a mask where recognized object was white area and the rest was black. By object tracking we compared the two approaches. The first approach is based on calculating optical flow using the iterative Lucas-Kanade method with pyramids. The second approach is based on camshift tracking algorithm. For deleting the object from video we focus to using algorithm based on restoring the selected region in an image using the region neighborhood.

Used functions: floodFill, findHomography, match, fillPoly, goodFeaturesToTrack, calcOpticalFlowPyrLK, inpaint, mixChannels, calcHist, CamShift

Solution

  1. Opening video file, retrieve the next frame (picture), converting from color image to grayscale
    cap.open("Video1.mp4");
    cap >> frame; 
    frame.copyTo(image); 
    cvtColor(image, gray, COLOR_BGR2GRAY);
    
  2. Find object in frame (picture)
    1. Keypoints detection and description (SIFT/SURF)
      // SiftFeatureDetector detector( minHessian ); 
      SurfFeatureDetector detector( minHessian );
      
      std::vector<KeyPoint> keypoints_object, keypoints_scene;
      
      detector.detect(img_object, keypoints_object); 
      detector.detect(img_scene, keypoints_scene);
      
      // SiftDescriptorExtractor extractor; 
      SurfDescriptorExtractor extractor;
      
      Mat descriptors_object, descriptors_scene;
      
      extractor.compute(img_object, keypoints_object, descriptors_object); 
      extractor.compute(img_scene, keypoints_scene, descriptors_scene);
      
    2. Matching keypoints
      FlannBasedMatcher matcher;
      std::vector< DMatch > matches; 
      matcher.match( descriptors_object, descriptors_scene, matches );
      
    3. Homography calculating
      Mat H = findHomography( obj, scene, CV_RANSAC );
      
    4. Mask creating
      cv::Mat mask(img_scene.size().height,img_scene.size().width,CV_8UC1);
      mask.setTo(Scalar::all(0));
      cv::fillPoly(mask,&pts, &n, 1, Scalar::all(255));
      

First tracking approach

  1. Find significant points in current frame (using mask with recognized object)
  2. Find significant points from the previous frame to the next
  3. Deleting object from image
    1. Calculate mask of current object position
    2. Modify mask of current object position
    3. Restore the selected region in an image using the region neighborhood.

Second tracking approach

  1. Calculate histogram of ROI
  2. Calculate the back projection of histogram
  3. Track object using camshift

Object recognition

Input

grznar_input

Outputs

grznar_surf
Surf (recognized object is in black rectangle)
grznar_sift
Sift (black dot is recognized object)

Tracking object

Input (tracked object)

grznar_input2

Outputs

grznar_approach1
first approach
grznar_approach2
Second approach

Modifying mask for deleting object

Input

grznar_mask1

Output

grznar_mask2

Deleting object

Input

grznar_input3

Output

Object_remove

Posted on

Local Descriptors in OpenCv

Tomas Martinkovic

The project shows detection of chocolate cover from input image or frame of video. For each video or image may be chosen various combinations of detector with descriptor. For matching object of chocolate cover with input frame or image automatically is used FlannBasedMatcher or BruteForceMatcher. It depends on the chosen SurfDescriptorExtractor or FREAK algorithm.

Functions used: SurfFeatureDetector, FastFeatureDetector, SiftFeatureDetector, StarFeatureDetector, SurfDescriptorExtractor, FREAK, FlannBasedMatcher, BruteForceMatcher, findHomography

Process

  1. Preprocessing – Conversion to grayscale
    cvtColor(frame, img_scene, CV_BGR2GRAY);
    
  2. Detect the keypoints
    detector_Surf = getSurfFeatureDetector();
    detector_Surf.detect( img_object, keypoints_object );
    detector_Surf.detect( img_scene, keypoints_scene );
    
  3. Compute local descriptors
    extractor_freak.compute( img_object, keypoints_object, descriptors_object );
    extractor_freak.compute( img_scene, keypoints_scene, descriptors_scene );
    
  4. Matching local descriptors
    BruteForceMatcher<Hamming> matcher;
    matcher.match( descriptors_object, descriptors_scene, matches );
    
  5. Draw good matches to frame
    drawMatches( img_object, keypoints_object, img_scene, keypoints_scene, good_matches, img_matches, Scalar::all(-1), Scalar::all(-1), vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );
    
  6. Finding homography and drawing frame of video
    Mat H = findHomography( obj, scene, CV_RANSAC );
    perspectiveTransform( obj_corners, scene_corners, H);
    imshow( "Object detection on video", img_matches );
    

Sample

Martinkovic
Matching local descriptors in the image.
Martinkovic2
Matching local descriptors in the video.