Object detection, Event detection – Page 3

Posted on 29. June 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Bag of Words Classifier

In computer vision and object recognition, we have three main areas â€“ object classification, detection and segmentation. Classification task deals only with assigning an image to a class (for example bicycle, dog, cactus, etcâ€¦), detection task moreover deals with detecting the position of the object in an image and segmentation task deals with finding the detailed contours of the object. Bag of words is a method which belongs to classification problem.

Algorithm steps

Find key points in images using Harris detector.
[c language=”c++”]
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");
Ptr<DescriptorExtractor> extractor = DescriptorExtractor::create("SIFT");
Ptr<FeatureDetector> detector = FeatureDetector::create("HARRIS");
[/c]
Extract SIFT local feature vectors from the set of images.
[c language=”c++”]
// Extract SIFT local feature vectors from set of images
extractTrainingVocabulary("data/train", extractor, detector, bowTrainer);
[/c]
Put all the local feature vectors into a single set.
[c language=”c++”]
vector<Mat> descriptors = bowTrainer.getDescriptors();
[/c]
Apply a k-means clustering algorithm over the set of local feature vectors in order to find centroid coordinates. This set of centroids will be the vocabulary.
[c language=”c++”]
cout << "Clustering " << count << " features" << endl;
Mat dictionary = bowTrainer.cluster();

cout << "dictionary.rows == " << dictionary.rows << ", dictionary.cols == " << dictionary.cols << endl;
[/c]
Compute the histogram that counts how many times each centroid occurred in each image. To compute the histogram find the nearest centroid for each local feature vector.

Histogram

We trained our model on 240 different images from 3 different classes â€“ bonsai, Buddha and porcupine. We then computed the following histogram which counts how many times each centroid occurred in each image. To find the values of the histogram we had to compare the distances of each local feature vector with each centroid and centroid with least difference to local feature vector has incremented in histogram. We used 1000 cluster centers.

Posted on 29. June 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Concrete Analysis

Description

In this work, we have detected a metallic wires on slide concrete. Metal parts are distributed randomly. It may happen that the positions of two adjacent wires or also cutting the wires along the length. Some wires are due to bad picture almost invisible. The images we applied filters from the library OpeCV and we have created an application that can recognize about 90% of the wires.

Input

Processing

Create marker of image
[c language=”c++”]
cv::erode(_grayScale, marker,
cv::getStructuringElement(cv::MORPH_ELLIPSE,
Â Â cv::Size(20, 20), cv::Point(-1,-1)), cv::Point(-1,-1), 2,
cv::borderInterpolate(1, 15, cv::BORDER_ISOLATED));
ImReconstruct(&(IplImage)marker, &(IplImage)_grayScale);
[/c]
Substraction grayscale image and marker image
[c language=”c++”]
grayScale = _grayScale – marker;
[/c]
Use some morphological operation and get contours
[c language=”c++”]
// Closing, erode, treshold
cv::findContours(grayScale.clone(), contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point(0,0) );
[/c]
Detailed analysis of the use of wires, in specific cases
Final output

Posted on 29. June 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Dominant Orientation Templates

Description

Dominant orientation templates (DOT) is a method for real-time object detection, which works well for detection of untextured objects and is related to method Histogram of oriented gradients. DOT is neither based on statistical learning of objectâ€™s shapes nor on feature point detection, but it uses real-time Template Matching recognition with locally most dominant orientations from HoG.

OpenCV function used

cvCaptureFromAVI, cvtColor

The process

Computation of gradients for each pixel in template and input image

Provided by convolution kernel
For each pixel
Gradient is defined by magnitude and direction
0-180Â° instead of 0-360Â° range
Directions can be discretized from 0-180Â° into bins (e.g. 9 bins by 20Â° )

[c language=”c++”]
for (int r=0;r<=area.rows-region_size;r+=region_size)
{
for (int s=0;s<=area.cols-region_size;s+=region_size)
{
int mag=gradienty_template.gradient[i][j].magnitude;
n0=histogram_group(gradienty_template.gradient[i][j].direction);
if (mag>min_magnitude)
template_hist.hist_matrix[hx][hy].bins[n0]+=mag;
}
}
[/c]

Dividing pixels into regions

[c language=”c++”]
for (int r=0;r<=area.rows-region_size;r+=region_size)
// moving in the picture with step size of 7 or 9
[/c]
Computing most dominant gradient orientations for each region

[c language=”c++”]
if (template_hist.hist_matrix[i][j].bins[k]>max) //now only the most dominant
{
Â Â ifÂ (template_hist.hist_matrix[i][j].bins[k]>min_magnitude)
Â Â {
Â Â Â Â max=template_hist.hist_matrix[i][j].bins[k];
Â Â Â Â max_index=k;
Â Â }
}
[/c]
Template matching and comparing of most dominant orientations.
Evaluating comparison.

Posted on 29. June 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Eye Blinking Detection

Description

The main purpose of this work is to detect eyes and recognize when are open and when close. To execute this purpose we must use video camera or video file with person face.

Eye Detection

To detect eye blinking we need to recognize face and eyes on image. For this intention we use Viola Jones algorithm which detect this features and bounded it with rectangles.

Because this algorithm is performance consuming, we use tracking which is more faster. We use Good Features to Track algorithm which return set of points suitable to track, and then with Lucas-Kanade tracker algorithm we track it on every frame.

[c language=”c++”]
// track points
calcOpticalFlowPyrLK(prevGray, gray, features, cornersB, status, error, Size(31, 31), 1000);

if(!calculateIntegralHOG(gray(rectangleFace)))
text = "CLOSED";
[/c]

There is problem with points which is not precisely targeted to next frame, so we remove them from set of tracking points. When number of points is not enough to track, we repeat eye detection method again.

Eye Blinking Detection

To detect if eyes are open or close we use HOG descriptor which return array of floats representing lines orientation. Because HOG descriptor is usable only on images with specific resolution, we use sliding window with this resolution which covers our image.

[c language=”c++”]
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9, 0.8, 0.00015, true);

// calculate HOG for every window
GpuMat gpuMat;
gpuMat.upload(cropped);
GpuMat descriptors;
gpu_hog.getDescriptors(gpuMat, win_size, descriptors);
Mat descriptorMat = Mat(descriptors);
[/c]

In next step we take array of floats returned from HOG descriptor and transform it to histogram. We notice, that when eye is closed the local maximum of this histogram is much lower than local maximum of opened eye, so we define the value which separate opened and closed eye.

Because we use sliding window, we average all this local maxims and based on returned value, we decide if specified area contains open or close eyes.

Posted on 29. June 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Dices Result Recognition

Description

The goal of this project is to implement algorithm that finds dots on dices. Motivation was idea / question how to create home-made random number generator? We can throw dices and our application will be able to recognize summary value on dices.

This program uses fitEllipse() function to find dots on dices. The basics steps are as follows:

Process

Open video stream
[c language=”c++”]
CvCapture* capture = cvCaptureFromCAM( CV_CAP_ANY );
[/c]
For the whole stream we create single frame
[c language=”c++”]
IplImage* frame = cvQueryFrame( capture );
[/c]
Invert color
Use adaptive threshold
[c language=”c++”]
adaptiveThreshold(image, bimage, 255, ADAPTIVE_THRESH_GAUSSIAN_C, CV_THRESH_BINARY, 15, -10);
[/c]
We can use morphological operations (dilatation, erosion) to expand/minimize contours
To find circles we use following:
[c language=”c++”]
Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);
[/c]
If difference between box.size.width and box.size.height is lower than treshold, we consider ellipse as circle.
At this point we have a lot of â€œcirclesâ€. An experimenting helped us to determine, which circle in the picture is real point on dice. Based on size of real dices points we can isolate only real dices points.

Example of process

Using custom settings we are able to improve results in specific situations.

[c language=”c++”]
findContours(bimage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);

Vector<pair<RotatedRect*, int>> vec, finalVec;
Mat cimage = Mat::zeros(bimage.size(), CV_8UC3);
int i;
int w, h, wAhThr, angleThr, centerThr;
int hwDifferenceThreshold, histThreshold;
centerThr = settCenterThr;
wAhThr = settwAhThr;
hwDifferenceThreshold = settHWdifferenceThr;
histThreshold = settHistogramThr;

for(i = 0; i < contours.size(); i++)
{
size_t count = contours[i].size();
if( count > 50 || count < 6)
continue;

Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);

w = box.size.width;
h = box.size.height;

int hwDifference = abs(h – w);
if (hwDifference > hwDifferenceThreshold)
continue;

if (w < wAhThr || h < wAhThr)
continue;

vec.push_back(pair<RotatedRect*, int>(new RotatedRect(box), i));
}

int asdf = vec.size();
Vector<pair<RotatedRect* ,int>>::iterator it ,iend, it2;
int MAXHIST = 200;
int* histVals = new int[MAXHIST];
for (int i = 0; i < MAXHIST; i++)
histVals[i] = 0;

int histIter = 0;
RotatedRect * box;
RotatedRect * box2;
int maxWidth = 0;
int distanceOfCenters;
for (it = vec.begin(), iend = vec.end(); it != iend; it++)
{
box = (it->first);
for (it2 = it + 1; it2 != iend; it2++)
{
box2 = (it2->first);
distanceOfCenters = (int)std::sqrt((box->center.x – box2->center.x) * (box->center.x – box2->center.x) + (box->center.y – box2->center.y) * (box->center.y – box2->center.y));
if (distanceOfCenters < centerThr)
{
if (box->size.width > box2->size.width)
it2->second = -1;
else
it->second = -1;
break;
}

}
}
[/c]

Posted on 12. April 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

CSS â€“ Curvature Scale Space in OpenCV

Description

The goal of this project is to implement algorithm that creates curvature scale space (CSS) image of given shape using OpenCV library. â€œThe CSS image consists of several arch-shape contours representing the inflection points of the shape as it is smoothed. The maxima of these contours are used to represent a shape. The CSS representation is robust with respect to scale, noise and change in orientation.â€[1]

CSS representations for various curve modifications [1]

Process

Find contour coordinates of given shape

[c language=”c++”]
findContours(im, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE );
[/c]

Following steps are repeated with increased sigma until there are no zero-crossing points:

Gaussian kernel is the base for upcoming steps:
[c language=”c++”]
transpose(getGaussianKernel(width, sigma, CV_64FC1), G);
[/c]
Curve evolution can be computed by convolution of contour points with Gaussian kernel. Smoothed contour is not needed for CSS computation; it is used only to visualize the process:

[c language=”c++”]
filter2D(X, Xsmooth, X.depth(), G);
filter2D(Y, Ysmooth, Y.depth(), G);
[/c]

Curve evolution with increasing sigma [2]
To compute 1^st and 2^ndderivation of contour points, Gaussian kernel derivations will be needed:
[c language=”c++”]
Sobel(G, dG, G.depth(), 1, 0, 3);
Sobel(G, ddG, G.depth(), 2, 0, 3);
[/c]
Convolution of contour points using derivatives of Gaussian kernel. According to OpenCV documentation: filter2D does actually computes correlation, not the convolution. That is, the kernel is not mirrored around the anchor point. If you need a real convolution, flip the kernel using flip() and set the new anchor to (kernel.cols – anchor.x – 1, kernel.rows – anchor.y – 1) :
[c language=”c++”]
flip(dg, dg, 0);
flip(ddg, ddg, 0);
Point anchor(dg.cols – fwhm -1, dg.rows – 0 – 1);
filter2D(X, dX, X.depth(), dG, anchor);
filter2D(Y, dY, Y.depth(), dG, anchor);
filter2D(X, ddX, X.depth(), ddG, anchor);
filter2D(Y, ddY, Y.depth(), ddG, anchor);
[/c]
Finally, we calculate the curvature and find zero crossings:
Curvature and inflection points of curve smoothed with sigma=16
Zero-crossing points are plotted to the final CSS image. X-axis represents position of point on the curve; Y-axis represents the value of sigma:

Final CSS image with zero-crossing points for all sigmas

Practical Applications

Finding similar shapes Â (Used as shape descriptor in MPEG-7 standard)
Corner detection

References

[1] Sadegh Abbasi, Farzin Mokhtarian, Josef Kittler: Curvature Scale Space Image in Shape Similarity Retrieval. Multimedia Syst. 7(6): 467-476 (1999)

[2] Farzin Mokhtarian, Alan K. Mackworth: A Theory of Multiscale, Curvature-Based Shape Representation for Planar Curves. IEEE Trans. Pattern Anal. Mach. Intell. 14(8): 789-805 (1992)

Posted on 12. April 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Detection and removal of circular artifacts from photographs

Description

Reflecting flashlight from dust, snowflakes or raindrops can produce irritating circular artifacts. For its detecting and removing we propose process, where we use improved circle detection besides using houghCircles function. For removing detected artifacts we use morphological reduction.

Functions used

adaptiveThreshold, Canny, HoughCircles, findContours, fitEllipse, ImReconstruct

Process

Grayscale input image with circular artifacts — Greyscale input image with circular artifacts.

Grayscale output image without circular artifacts — Output image.

Limitation: Minimal circle size (15px )- Maximal circle size(30px)

Preprocessing – Adaptive threshold
[c language=”c++”]
medianBlur()
adaptiveThreshold()
OutputImg := InputImg + FilteredImg
[/c]
Detection with HoughCircles
[c language=”c++”]
Canny()
GaussianBlur()
HoughCircles()
Accept/ignore circles (based on size)
[/c]
Detection with Morphological reconstruction and contour analysis
[c language=”c++”]
mask := InputImg
marker := InputImg â€“ degreeOfMorphreduct
marker := inv(marker)
morphologicalReconstruction(marker, mask)
differenceImg := marker2 â€“ marker1
differenceImg := medianBlur(differenceImg)
differenceImg := threshold(differenceImg)
contour[] = findContours (differenceImg)
ellipse[i] := fitEllipse(contour[i])
accept/ignore circles (based on size and ellipse axes)
draw white ellipse[i]
draw black contour[i]
crop Regions Of Interest
opening(regionOfInterest[i])
if countNonZero(regionOfInterest[i]) > threshold then accept; else ignore;
[/c]
Result

Posted on 12. April 201327. September 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Convolutional neural networks

Description

Convolutional Neural Networks(CNNs) are multi-layered neural networks with standard hidden layers and with at least one convolutional layer. They are suitable for visual processing because they exploit the tolopogy of inputs.

Because we are interested in more general structures of networks and layers, the CNN is implemented with automatic differentiation (AD) in mind. This means that one has to provide only the implementation of forward pass of any structure. It is important to note that AD is based on certainly different principle as finite differences. The important difference of AD is that it yields precise results.

Architecture

The architecture of convolutional networks is shown in figure 1. Convolutional layers are located right after inputs. After each convolutional layer the process of subsampling is performed. By subsampling we improve translational invariance and significantly reduce complexity. After convolutional layers, standard fully connected hidden layers are used. These are then mapped to desired outputs.

When the signal passes into standard hidden layer, it is no longer meaningful to pass it again into another convolutional layers. However it can be meaningful to have heterogeneous layers consisting of convolutional and also standard units.

Experiments

Note that it is meaningful for input to be two-dimensional. We have used CNN for handwritten digits recognition of US Postal Service.

2,5% human error rate
2,0% best error rate: combination of multiple classifiers
4,7% best achievement of our CNN

We note, that CNNs are sensitive to parameter settings including number and size of convolutional kernels. However, when properly set, CNNs perform well. We can see an example output of convolutional layers at figure 2.

Example of features extracted in the first convolutional layer.

Posted on 3. March 201127. September 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Hand Tracking and Gesture Recognition Using Echo State Neural Networks

Peter Fillo

Hand Tracking and Gesture Recognition Using Echo State Neural Networks.Â Tracking an object in a video sequence is a complex problem which presents one of the fundamental task of image processing. One of the many use cases is controlling using hand gestures in Human-Computer Interaction. This paper introduces real-time hand recognition and tracking in video sequence with a classification of performed hand gestures. Hand recognition is based on foreground segmentation and skin region detection. Attributes of hand movements are being recorded and used as an input to a echo state neural network which performs hand gesture classification. Work presents proposed tracking algorithm and first results of gesture recognition.