Tracking in video – Vision & Graphics Group

Posted on 5. June 20165. June 2016 by Dipl.-Ing. Wanda Benešová, PhD.

Motion analysis in CCTV records

Filip Mazan

This project deals with analysis of video captures from CCTVs to detect peopleâ€™s motion and extract their trajectories in time. The output of this project is a relatively short video file containing only frames of the original where the movement was detected along with shown trajectories of people. The second output consists of cumulative image of all trajectories. This can be later used to classify trajectories as (not) suspicious.

Each frame of input video is converted into grayscale and median filtered to remove noise
First 30 seconds of video is used as a learning phase for MOG2 background subtractor
For each next frame the MOG2 mask is calculated and morphological closing is applied to it
If count of non-zero pixels is greater than a set threshold, we claim there is a movement present
1. Good features to track are found if there is not many left
2. Optical flow is calculated
3. Each point which has moved is stored along with frame number
If there is no movement on current frame, postprocess last movement interval (if any)
1. All stored tracking points (x, y, frame number) from previous phase are clusterized by k-means into variable number of centroids
2. All centroids are sorted by their frame number dimension
3. Trajectory is drawn onto the output frame
4. Movement sequence is written into the output video file along with continuously drawn trajectory

Following image shows the sum of all trajectories found in 2 hours long input video. This can be used to classify trajectories as (not) suspicious.

Posted on 5. June 20169. June 2016 by Dipl.-Ing. Wanda Benešová, PhD.

Car detection in videos

Peter Horvath

We detect cars from videos recorded by dash cameras situated in cars. This type of camera is dynamic so we decided to train and use Haar Cascade Classifier. The classifier itself returns a lot of false positive results. So we improved classifier by removing false positive results using road detection.

Functions used:Â cvtColor, split, Rect, inRange, equalizeHist, detectMultiScale, rectangle, bitwise_and

Process

1^st part â€“ training haar cascade classifier

Collect aÂ set of positive samples and negative samples. Make aÂ list file of both (positives.dat and negatives.dat). Then use opencv_createsamples function with parameters to make aÂ single .vec file with all positive samples.

opencv_createsamples -info positives.dat -vec samples.vec -num 500 -w 20 -h 20

Now train aÂ cascade classifier using HAAR features

opencv_traincascade -data classifier -featureType HAAR -vec samples.vec -bg negatives.dat -numPos 500 -numNeg 850 -numStages 15 -precalcValBufSize 1000 -precalcIdxBufSize 1000 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -mode ALL -w 20 -h 20

Output of this procedure is trained classifier â€“ xml file.

2^nd part â€“ using classifier in C++ code to detect cars, improved by road detection

Open video file using VideoCapture. For every video frame do:

Convert actualÂ video frame to HSV color model
```
cvtColor(frame, frame_hsv, CV_BGR2HSV);
```

Make sum of H S V in captured road sample. Calculate average Hue Saturation and Value of captured road sample.

int averageHue = sumHue / (rectangle_hsv_channels[0].rows*rectangle_hsv_channels[0].cols);
int averageSat = sumSat / (rectangle_hsv_channels[1].rows*rectangle_hsv_channels[1].cols);
int averageVal = sumVal / (rectangle_hsv_channels[2].rows*rectangle_hsv_channels[2].cols);

Use inRange function to make a binary result â€“ road is white colored, other is black colored

inRange(frame_hsv, cv::Scalar(averageHue - 180, averageSat - 15, averageVal - 20), cv::Scalar(averageHue + 180, averageSat + 15, averageVal + 20), final);

Convert actual video frame to grayscale

cvtColor(frame, image_gray, CV_BGR2GRAY);

Create an instance of CascadeClassifier

String car_cascade_file = "classifier.xml";
CascadeClassifier car_classifier;
car_classifier.load(car_cascade_file);

Detect cars in grayscale video frame using classifier

car_classifier.detectMultiScale(image_gray, cars, 1.1, 2, 0 | CV_HAAR_SCALE_IMAGE, Size(20, 20));

Result have a lot of false positives

Make a black image with white squares at locations returned by cascade classifier. Make logical and between it and image with detected road
Accept only squares which have at least 20% of pixels white.

Limitations:

Cascade classifier trained only with 560 positive and 860 negative samples â€“ detect cars only from near distance
Road detection fails when some object (car, road line) comes to blue rectangle (supposed to be road sample)
Dirt have a similar saturation as road â€“ detected as road

Posted on 22. April 20155. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Detection of objects in soccer

Lukas Sekerak

Project idea

Try detect objects (players, soccer ball, referees, goal keeper) in soccer match. Detect their position, movement and show picked object in ROI area. More info in a presentation and description document.

Requirements

Opencv 2.4
log4cpp

Dataset videos

Operation Agreement CNR-FIGC

T. Dâ€™Orazio, M.Leo, N. Mosca, P.Spagnolo, P.L.Mazzeo A Semi-Automatic System for Ground Truth Generation of Soccer Video Sequences in the Proceeding of the 6th IEEE International Conference on Advanced Video and Signal Surveillance, Genoa, Italy September 2-4 2009

Setup

Clone this repository into workspace
Download external requirements + dataset
Build project
Run project

Control keys

W – turn on/off ROI area
Q,E – switch between detected ROI
S – pause of processing frames
F – turn on/off debug draw

License

This software is released under the MIT License.

Credits

Ing. Wanda BeneÅ¡ovÃ¡, PhD. – Supervisor

Project repository:Â https://github.com/sekys/sk.seky.soccerball

Posted on 23. February 201514. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Motion Analysis & Object Tracking

Pavol Zbell

Introduction

In our work we focus on basics of motion analysis and object tracking. We compare MeanShift (non-parametric, finds an object on a back projection image) versus CamShift (continuously adaptive mean shift, finds an object center, size, and orientation) algorithms and effectively utilize them to perform simple object tracking. In case these algorithms fail to track the desired object or the object travels out of window scope, we try to find another object to track. To achieve this, we use a background subtractor based on a Gaussian Mixture Background / Foreground Segmentation AlgorithmÂ to identify the next possible object to track. There areÂ two suitable implementations of this algorithm in OpenCV â€“ BackgroundSubtractorMOG and BackgroundSubtractorMOG2. We also compare performance of both these implementations.

Used functions:Â calcBackProject, calcHist, CamShift, cvtColor, inRange, meanShift, moments, normalize

Solution

Initialize tracking window:
- Set tracking window near frame center

Track object utilizing MeanShift / CamShift

Calculate HSV histogram of region of interest (ROI) and track

int dims = 1;
int channels[] = {0};
int hist_size[] = {180};
float hranges[] = {0, 180};
const float *ranges[] = {hranges};
roi = frame(track_window);
cvtColor(roi, roi_hsv, cv::COLOR_BGR2HSV);
// clamp > H: 0 - 180, S: 60 - 255, V: 32 - 255
inRange(roi_hsv, Scalar(0.0, 60.0, 32.0), Scalar(180.0, 255.0, 255.0), mask);
calcHist (&roi_hsv, 1, channels, mask, roi_hist, dims, hist_size, ranges);
normalize(roi_hist, roi_hist, 0, 255, NORM_MINMAX);
...
Mat hsv, dst;
cvtColor(frame, hsv, cv::COLOR_BGR2HSV);
calcBackProject(&hsv, 1, channels, roi_hist, dst, ranges, 1);
clamp_rect(track_window, bounds);
print_rect("track-window", track_window);
Mat result = frame.clone();
if (use_camshift) {
	RotatedRect rect = CamShift(dst, track_window, term_criteria);
	draw_rotated_rect(result, rect, Scalar(0, 0, 255));
}
else {
	meanShift(dst, track_window, term_criteria);
	draw_rect(result, track_window, Scalar(0, 0, 255));
}

Lost tracked object?

In other words, is the centeroid of MOG mask out of tracking window?

bool contains;
if (use_camshift) {
	contains = rect.boundingRect().contains(center);
}
else {
	contains = center.inside(track_window);
}

When lost, reinitialize tracking window:

Set tracking window to centeroid of MOG mask
Go back to 2. and repeat

mog->operator()(frame, mask);
center = compute_centroid(mask);
track_window = RotatedRect(Point2f(center.x, center.y), Size2f(100, 50), 0).boundingRect();

Samples

As seen on Fig.Â 1, MeanShift (left) operates with fixed size tracking windows which can not be rotated. On the contrary, CamShift (right) utilizes the full potential of dynamic size rotated rectangles. Working with CamShift yielded significantly better tracking results in general. On the other hand we recommend to use MeanShift when the object is in constant distance from the camera and moves without rotation (or is represented by a circle), in such case MeanShift performs faster than CamShift and produces sufficient results without any rotation or size change noise.

zbell_meanshift_camshift — Fig. 1: MeanShift vs. CamShift.

Comparison of BackgroundSubtractorMOG and BackgroundSubtractorMOG2 is depicted on Fig. 2.Â MOG approach is simpler than MOG2 as it considers only binary masks whereas MOG2 operatesÂ on a full gray scale masks. Experiments shown that in our specific case MOG performed better as itÂ yielded less information noise than MOG2. MOG2 will probably produce better results than MOGÂ when utilized more effectively than in out initial approach (simple centeroid from mask extraction).

Summary

In this project explored the possibilities of simple object tracking via OpenCV APIs utilizingÂ various algorithms such as MeanShift and CamShift, Background Extractor MOG and MOG2,Â which we also compared. Our solution performs relatively well, but we can certainly improve it byÂ fine tuning histogram calculation, MOG, and other parameters. Other improvements can be done inÂ MOG usage, as now the objects are only recognized by finding MOG mask centeroids. This alsoÂ calls to better tracking window initialization process.

Posted on 23. February 201514. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Tracking moving object

This example shows how to separate and track moving object using OpenCV. First, the background of the video is being calculated and moving objects detected, then it is filtered and tracked.

Used: cv::BackgroundSubtractorMOG2; cv::getStructuringElement; cv::morphologyEx; cv::BackgroundSubtractorMOG2.operator();

The process

Initialize the background extraction object

BackgroundSubtractorMOG2 bg( 500, 64, false);

Process video frame by background extraction object by it’s method operator and receive mask of moving object
```
bg.operator()( origi, mask);
```
Process mask by morphologyEx’s open to remove noise in mask
```
morphologyEx( mask, mask, MORPH_OPEN, element1 );
```
Process mask by morphologyEx’s close to close gaps in mask
```
morphologyEx( mask, mask, MORPH_CLOSE, element2 );
```
Apply mask on video frame
```
origi.copyTo( proci0, mask);
```

Find good features to track and apply KLTracker

goodFeaturesToTrack( proci0, points[0], MAX_COUNT, 0.1, 10, Mat(), 3, 0, 0.04);
calcOpticalFlowPyrLK( proci1, proci0, points[0], points[1], status, err, winSize, 3, termcrit, 0, 0.00001);

Combine with initial frame

size_t i, k;
for (i = k = 0; i &lt; points[1].size(); i++)
{
	if (!status[i]) continue;
	points[1][k++] = points[1][i];
	circle(finI, points[1][i], 3, Scalar(0, 255, 0), -1, 8);
	circle(procI2, points[1][i], 3, Scalar(0, 255, 0), -1, 8);
}
points[1].resize(k);

Find contours of the mask, find it’s bounding rectangle and draw it onto output frame

findContours(procI3, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
if (contours.size() &gt; 0)
for (int i = 0; i &lt; (int)contours.size(); i++)
	rectangle(finI, boundingRect(contours[i]), Scalar(0, 255, 0));

Bounding rectangle hints position of moving object on the scene and could be used to approximate it’s coordinates

Posted on 23. February 201514. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Tongue tracking

Simek Miroslav

This project is focused on tracking tongue using just the information from plain web camera. Â Majority of approaches tried in this project failed including edge detection, morphological reconstruction and point tracking because of various reasons like homogenous and position-variable character of tongue.

The approach that yields usable results is Farneback method of optical flow. By using this method we are able to detect the direction of movement in image and tongue specifically when we use it on image of sole mouth. However mouth area found by haar cascade classifier is very shaky so the key part is to stabilize it.

Functions used: calcOpticalFlowFarneback, CascadeClassifier.detectMultiScale

The process:

Detection of face and mouth using haar cascade classifier where mouth is being searched in the middle of the area between nose and bottom of the face.

faceCascade.detectMultiScale(frame, faces, 1.1, 3, 0, Size(200, 200), Size(1000, 1000));
mouthCascade.detectMultiScale(faceMouthAreaImage, possibleMouths, 1.1, 3, 0, Size(50, 20), Size(250, 150));
noseCascade.detectMultiScale(faceNoseAreaImage, possibleNoses, 1.1, 3, 0, Size(20, 30), Size(150, 250));

Stabilization of mouth area on which optical flow will be used.

const int movementDistanceThreshold = 40;
const double movementSpeed = 0.25;

int xDistance = abs(newMouth.x - mouth.x);
int yDistance = abs(newMouth.y - mouth.y);

if (xDistance + yDistance > movementDistanceThreshold)
	moveMouthRect = true;

if (moveMouthRect)
{
	mouth.x += (int)((double)(newMouth.x - mouth.x) * movementSpeed);
	mouth.y += (int)((double)(newMouth.y - mouth.y) * movementSpeed);
}

if (xDistance + yDistance <= 1.0 / movementSpeed)
	moveMouthRect = false;

Optical flow (Farneback) of the current and previous stabilized frames from camera.

cvtColor(img1, in1, COLOR_BGR2GRAY);
cvtColor(img2, in2, COLOR_BGR2GRAY);
calcOpticalFlowFarneback(in1, in2, opticalFlow, 0.5, 3, 15, 3, 5, 1.2, 0);

Limitation:

Head movements must be minimal to none to work correctly.
Actual position of tongue is unknown. What is being tracked is the direction of tongueâ€™s movement in the moment when the tongue moved.

Samples:

Posted on 23. February 201516. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Tracking the movement of the lips

Peter Demcak

In this project, we aim to recognize the gestures made by the users by moving their lips; Examples: closed mouth, mouth open, mouth wide open, puckered lips. The challenges in this task are the high homogeneity in the observed area, and the rapidity of lip movements. Our first attempts in detecting said gestures are based on the detection of the lip movements through flow with the Farneback method implemented in OpenCV, or alternatively the calculation of the motion gradient from a silhouette image. It appears, that these methods might not be optimal for the solution of this problem.

OpenCV functions: cvtColor, Sobel, threshold, accumulateWeighted, calcMotionGradient, calcOpticalFlowPyrLK

Process

Detect the position of the largest face in the image using OpenCV cascade classifier. Further steps will be applied using the lower half of the found face.
```
faceRects = detect(frame, faceClass);
```
Transform the image map to HLS color space, and obtain the luminosity map of the image

Combine the results of horizontal and vertical Sobel methods to detect edges of the face features.

Sobel(hlsChannels[1], sobelVertical, CV_32F, 0, 1, 9);
Sobel(hlsChannels[1], sobelHorizontal, CV_32F, 1, 0, 9);
cartToPolar(sobelHorizontal, sobelVertical, sobelMagnitude, sobelAngle, false);

Add accumulative edge detection frame images on top of each other to obtain the silhouette image. To prevent raisedÂ noise in areas without edges, apply a threshold to the Sobel map.
```
threshold(sobelMagnitude, sobelMagnitude, norm(sobelMagnitude, NORM_INF)/6, 255, THRESH_TOZERO);
accumulateWeighted(sobelMagnitude, motionHistoryImage, intensityLoss);
```
Calculate the flow using the Farneback method implemented in OpenCV using the current and previous frame
```
calcOpticalFlowFarneback(prevSobel, sobelMagnitudeCopy, flow, 0.5, 3, 15, 3, 5, 1.2, 0);
```

Posted on 21. July 20133. November 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Tracking people in video with calculating the average speed of the monitored points

This example shows a new method for tracking significant points in video, representing people or moving objects. This method uses several OpenCV functions.

The process

The opening video file

VideoCapture MojeVideo (â€žcesta k sÃºboru");

Retrieve the next frame (picture)

Mat FarebnaSnimka;
MojeVideo >> FarebnaSnimka;

Converting color images to grayscale image

Mat Snimka1;
cvtColor(FarebnaSnimka, Snimka1, CV_RGB2GRAY);

Getting significant (well observable) points

vector<cv::Point2f> VyznacneBody;
goodFeaturesToTrack(Snimka1, VyznacneBody, 300, 0.06, 0);

Getting the next frame and its conversion

Finding significant points from the previous frame to the next

vector<cv::Point2f> PosunuteBody;
vector<uchar> PlatneBody;
calcOpticalFlowPyrLK(Snimka1, Snimka2, VyznacneBody, PosunuteBody, PlatneBody, err);

Calculation of the velocity vector for each significant point
Clustering of significant points according to their average velocity vectors
Visualization
1. Assign a color to cluster
2. Plotting points on aÂ slide
3. Plotting arrows at the center points of clusters – average of the average velocity vectors
Dumping the clusters and other places for the classification of points into them (to preserve the color of the cluster) + eventual creation of new clusters
Landmarks declining over time – the time when they need to re-designate

Result

This method is faster than OpenCV method for detecting people.
It also works when only part of person is visible, position is unusual or person is rotated.
Person is divided to parts.
It does not distinguish between persons or other moving objects.

Posted on 29. June 20138. October 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Google Street View Video

Description

The goal of this project is to create a program that will be able to stitch a sequence of images fromgoogle street-view and make movie from it. The idea came to my mind, when I needed to check thecrossroads and traffic signals along the route I’ve never driven before. The method was tointerpolatefew more images between two consecutive views to simulate moving car. To do that Ihad to do following steps:

Process

Remove UI elements from images
Find homography between following images
Interpolate homography between them
Put images into movie

Removing UI elements from images

Removing UI elements is important because in later steps I will need to find similar areas and those elements can spoil the match-up. First I cycled through all images and gained areas with same color in black. The resulting image was accumulated from all the differences. Black areas represent pixels that were same in all images. To improve the mask I inverted the image did some thresholding, Gaussian blur and again thresholding. Result was mask used for inpaint method to fill in those regions without UI elements with color.

Example of process

Finding homography

Homography found between two images was found using SURF detector. I improved the detection by using mask of similar areas as in previous step. I did it because many key-points were detected on sky or far objects and results were generally worse. Last step was to interpolate from one image to another using homography. In my program I used 25 steps between two pictures. Those pictures were stacked into movie and saved.

Mat homo = findMatch(pic1, pic2);
bj_corners[0] = cvPoint(0,0)
////-- Get the corners from the image_1 ( the object to be "detected" )
vector<Point2f> obj_corners(4);
obj_corners[0] = cvPoint(0,0);
obj_corners[1] = cvPoint( pic2.cols, 0 );
obj_corners[2] = cvPoint( pic2.cols, pic2.rows );
obj_corners[3] = cvPoint( 0, pic2.rows );


vector<Point2f> corners(4);
vector<Point2f> inter_corners(4);
perspectiveTransform( obj_corners, corners, homo);
...
...
for(int i=0; i<4; i++) {
 inter_corners[i].x = corners[i].x - distance[i].x*j;
 inter_corners[i].y = corners[i].y - distance[i].y*j;
}
Mat interHomo = findHomography( corners, inter_corners, 0 );
Mat transformed;
warpPerspective(pic1, transformed, interHomo, Size(600,350));
result[j] = transformed;

Posted on 29. June 201327. September 2015 by Dipl.-Ing. Wanda Benešová, PhD.

Moving Vehicle Detection

The goal of this project is to implement algorithm that segments foreground using OpenCV library. We assume that background is static, objects in foreground are moving and video is taken from static camera. We detect movingÂ vehicles (foreground) with 2 methods.

Background detection

First method is computing an average image from video frames.

Each frame is added into accumulator with a certain small weight (0.05 and smaller)
1. accumulateWeighted(frame, background, alpha);
2. at this point we have in Mat background actual backgorund image
To detect foreground we have to compute difference between current frame and current accumulated background image
1. absdiff(frame, background, diff);
2. in Mat diff is color image, we need to transform it into grayscale image
To detect relevant changes (more than given threshold) we use simple
1. thresholdthreshold(diff, foregroundMask, 20, 255, CV_THRESH_BINARY);
2. in Mat foregroundMask is foreground mask
We can use morphological operations (dilatation, erosion) to expand foreground region

Each steps (1-4) are illustrated in next figures

Frames history

Second method is compare current frame with older frame. It is enough if the stack is 5-20 elements large- it depends on the speed of vehicles.

Add current frame into stack (if stack is full, first element is erased and the rest is shifted)
1. framesHistory.add(frame);
Compute difference between current frame and first element in the stack
1. first = framesHistory.first();
2. absdiff(frame, first, diff);
To detect relevant changes (more than given threshold) we use simple threshold
1. threshold(diff, foregroundMask, 20, 255, CV_THRESH_BINARY);
2. in Mat foregroundMask is foreground mask
We can use morphological operations (dilatation, erosion) to expand foreground region

Each steps (1-4) are illustrated in next figures. LocalBackgound is the older frame (5 frames old)

Combination of methods

These 2 methods are combined to create more accurate output. In the next picture we can see that first way (computing average image) creates â€œtailsâ€ behind vehicles. Comparing with older frame doesnâ€™t create â€žtailsâ€œ.

We use simple sum of binary masks

Mat sumMask = mask1 & mask2

How to create more precise segmentation, future work

To create more precise vehicle segmentation we have to use another methods. Shadows of cars, lights of cars make these 2 methods to hard use. We can only segment region where could be a car, but for more precisely detection we have to use template matching (champfer matching), graph cut method. In this project we experimented with these other two, but these were too complex, so the time complexity was unacceptable to use in video.