Posted on

Motion analysis in CCTV records

Filip Mazan

This project deals with analysis of video captures from CCTVs to detect people’s motion and extract their trajectories in time. The output of this project is a relatively short video file containing only frames of the original where the movement was detected along with shown trajectories of people. The second output consists of cumulative image of all trajectories. This can be later used to classify trajectories as (not) suspicious.

  1. Each frame of input video is converted into grayscale and median filtered to remove noise
  2. First 30 seconds of video is used as a learning phase for MOG2 background subtractor
  3. For each next frame the MOG2 mask is calculated and morphological closing is applied to it
  4. If count of non-zero pixels is greater than a set threshold, we claim there is a movement present
    1. Good features to track are found if there is not many left
    2. Optical flow is calculated
    3. Each point which has moved is stored along with frame number
  5. If there is no movement on current frame, postprocess last movement interval (if any)
    1. All stored tracking points (x, y, frame number) from previous phase are clusterized by k-means into variable number of centroids
    2. All centroids are sorted by their frame number dimension
    3. Trajectory is drawn onto the output frame
    4. Movement sequence is written into the output video file along with continuously drawn trajectory

Following image shows the sum of all trajectories found in 2 hours long input video. This can be used to classify trajectories as (not) suspicious.


Posted on

Car detection in videos

Peter Horvath

We detect cars from videos recorded by dash cameras situated in cars. This type of camera is dynamic so we decided to train and use Haar Cascade Classifier. The classifier itself returns a lot of false positive results. So we improved classifier by removing false positive results using road detection.

Functions used: cvtColor, split, Rect, inRange, equalizeHist, detectMultiScale, rectangle, bitwise_and


1st part – training haar cascade classifier

Collect a set of positive samples and negative samples. Make a list file of both (positives.dat and negatives.dat). Then use opencv_createsamples function with parameters to make a single .vec file with all positive samples.

opencv_createsamples -info positives.dat -vec samples.vec -num 500 -w 20 -h 20

Now train a cascade classifier using HAAR features

opencv_traincascade -data classifier -featureType HAAR -vec samples.vec -bg negatives.dat -numPos 500 -numNeg 850 -numStages 15 -precalcValBufSize 1000 -precalcIdxBufSize 1000 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -mode ALL -w 20 -h 20

Output of this procedure is trained classifier – xml file.

2nd part – using classifier in C++ code to detect cars, improved by road detection

Open video file using VideoCapture. For every video frame do:

  1. Convert actual video frame to HSV color model
    cvtColor(frame, frame_hsv, CV_BGR2HSV);
  2. Make sum of H S V in captured road sample. Calculate average Hue Saturation and Value of captured road sample.
    int averageHue = sumHue / (rectangle_hsv_channels[0].rows*rectangle_hsv_channels[0].cols);
    int averageSat = sumSat / (rectangle_hsv_channels[1].rows*rectangle_hsv_channels[1].cols);
    int averageVal = sumVal / (rectangle_hsv_channels[2].rows*rectangle_hsv_channels[2].cols);
  3. Use inRange function to make a binary result – road is white colored, other is black colored
    inRange(frame_hsv, cv::Scalar(averageHue - 180, averageSat - 15, averageVal - 20), cv::Scalar(averageHue + 180, averageSat + 15, averageVal + 20), final);		


  4. Convert actual video frame to grayscale
    cvtColor(frame, image_gray, CV_BGR2GRAY);
  5. Create an instance of CascadeClassifier
    String car_cascade_file = "classifier.xml";
    CascadeClassifier car_classifier;
  6. Detect cars in grayscale video frame using classifier
    car_classifier.detectMultiScale(image_gray, cars, 1.1, 2, 0 | CV_HAAR_SCALE_IMAGE, Size(20, 20));

    Result have a lot of false positives


  7. Make a black image with white squares at locations returned by cascade classifier. Make logical and between it and image with detected road
  8. Accept only squares which have at least 20% of pixels white.


  • Cascade classifier trained only with 560 positive and 860 negative samples – detect cars only from near distance
  • Road detection fails when some object (car, road line) comes to blue rectangle (supposed to be road sample)
  • Dirt have a similar saturation as road – detected as road
Posted on

Detection of objects in soccer

Lukas Sekerak

Project idea

Try detect objects (players, soccer ball, referees, goal keeper) in soccer match. Detect their position, movement and show picked object in ROI area. More info in a presentation and description document.


  • Opencv 2.4
  • log4cpp

Dataset videos

Operation Agreement CNR-FIGC

T. D’Orazio, M.Leo, N. Mosca, P.Spagnolo, P.L.Mazzeo A Semi-Automatic System for Ground Truth Generation of Soccer Video Sequences in the Proceeding of the 6th IEEE International Conference on Advanced Video and Signal Surveillance, Genoa, Italy September 2-4 2009


  1. Clone this repository into workspace
  2. Download external requirements + dataset
  3. Build project
  4. Run project

Control keys

  • W – turn on/off ROI area
  • Q,E – switch between detected ROI
  • S – pause of processing frames
  • F – turn on/off debug draw


This software is released under the MIT License.


  • Ing. Wanda BeneÅ¡ová, PhD. – Supervisor


Project repository:

Posted on

Motion Analysis & Object Tracking

Pavol Zbell


In our work we focus on basics of motion analysis and object tracking. We compare MeanShift (non-parametric, finds an object on a back projection image) versus CamShift (continuously adaptive mean shift, finds an object center, size, and orientation) algorithms and effectively utilize them to perform simple object tracking. In case these algorithms fail to track the desired object or the object travels out of window scope, we try to find another object to track. To achieve this, we use a background subtractor based on a Gaussian Mixture Background / Foreground Segmentation Algorithm  to identify the next possible object to track. There are  two suitable implementations of this algorithm in OpenCV – BackgroundSubtractorMOG and BackgroundSubtractorMOG2. We also compare performance of both these implementations.

Used functions: calcBackProject, calcHist, CamShift, cvtColor, inRange, meanShift, moments, normalize


  1. Initialize tracking window:
    • Set tracking window near frame center
  2. Track object utilizing MeanShift / CamShift
    • Calculate HSV histogram of region of interest (ROI) and track
    int dims = 1;
    int channels[] = {0};
    int hist_size[] = {180};
    float hranges[] = {0, 180};
    const float *ranges[] = {hranges};
    roi = frame(track_window);
    cvtColor(roi, roi_hsv, cv::COLOR_BGR2HSV);
    // clamp > H: 0 - 180, S: 60 - 255, V: 32 - 255
    inRange(roi_hsv, Scalar(0.0, 60.0, 32.0), Scalar(180.0, 255.0, 255.0), mask);
    calcHist (&roi_hsv, 1, channels, mask, roi_hist, dims, hist_size, ranges);
    normalize(roi_hist, roi_hist, 0, 255, NORM_MINMAX);
    Mat hsv, dst;
    cvtColor(frame, hsv, cv::COLOR_BGR2HSV);
    calcBackProject(&hsv, 1, channels, roi_hist, dst, ranges, 1);
    clamp_rect(track_window, bounds);
    print_rect("track-window", track_window);
    Mat result = frame.clone();
    if (use_camshift) {
    	RotatedRect rect = CamShift(dst, track_window, term_criteria);
    	draw_rotated_rect(result, rect, Scalar(0, 0, 255));
    else {
    	meanShift(dst, track_window, term_criteria);
    	draw_rect(result, track_window, Scalar(0, 0, 255));
  3. Lost tracked object?
    • In other words, is the centeroid of MOG mask out of tracking window?
    bool contains;
    if (use_camshift) {
    	contains = rect.boundingRect().contains(center);
    else {
    	contains = center.inside(track_window);
  4. When lost, reinitialize tracking window:
    • Set tracking window to centeroid of MOG mask
    • Go back to 2. and repeat
    mog->operator()(frame, mask);
    center = compute_centroid(mask);
    track_window = RotatedRect(Point2f(center.x, center.y), Size2f(100, 50), 0).boundingRect();


As seen on Fig. 1, MeanShift (left) operates with fixed size tracking windows which can not be rotated. On the contrary, CamShift (right) utilizes the full potential of dynamic size rotated rectangles. Working with CamShift yielded significantly better tracking results in general. On the other hand we recommend to use MeanShift when the object is in constant distance from the camera and moves without rotation (or is represented by a circle), in such case MeanShift performs faster than CamShift and produces sufficient results without any rotation or size change noise.

Fig. 1: MeanShift vs. CamShift.

Comparison of BackgroundSubtractorMOG and BackgroundSubtractorMOG2 is depicted on Fig. 2. MOG approach is simpler than MOG2 as it considers only binary masks whereas MOG2 operates on a full gray scale masks. Experiments shown that in our specific case MOG performed better as it yielded less information noise than MOG2. MOG2 will probably produce better results than MOG when utilized more effectively than in out initial approach (simple centeroid from mask extraction).

Fig. 2: MOG vs. MOG2.


In this project explored the possibilities of simple object tracking via OpenCV APIs utilizing various algorithms such as MeanShift and CamShift, Background Extractor MOG and MOG2, which we also compared. Our solution performs relatively well, but we can certainly improve it by fine tuning histogram calculation, MOG, and other parameters. Other improvements can be done in MOG usage, as now the objects are only recognized by finding MOG mask centeroids. This also calls to better tracking window initialization process.

Posted on

Tracking moving object

This example shows how to separate and track moving object using OpenCV. First, the background of the video is being calculated and moving objects detected, then it is filtered and tracked.

Used: cv::BackgroundSubtractorMOG2; cv::getStructuringElement; cv::morphologyEx; cv::BackgroundSubtractorMOG2.operator();

The process

  1. Initialize the background extraction object
    BackgroundSubtractorMOG2 bg( 500, 64, false);
  2. Process video frame by background extraction object by it’s method operator and receive mask of moving object
    bg.operator()( origi, mask);


  3. Process mask by morphologyEx’s open to remove noise in mask
    morphologyEx( mask, mask, MORPH_OPEN, element1 );
  4. Process mask by morphologyEx’s close to close gaps in mask
    morphologyEx( mask, mask, MORPH_CLOSE, element2 );


  5. Apply mask on video frame
    origi.copyTo( proci0, mask);
  6. Find good features to track and apply KLTracker
    goodFeaturesToTrack( proci0, points[0], MAX_COUNT, 0.1, 10, Mat(), 3, 0, 0.04);
    calcOpticalFlowPyrLK( proci1, proci0, points[0], points[1], status, err, winSize, 3, termcrit, 0, 0.00001);


  7. Combine with initial frame
    size_t i, k;
    for (i = k = 0; i < points[1].size(); i++)
    	if (!status[i]) continue;
    	points[1][k++] = points[1][i];
    	circle(finI, points[1][i], 3, Scalar(0, 255, 0), -1, 8);
    	circle(procI2, points[1][i], 3, Scalar(0, 255, 0), -1, 8);


  8. Find contours of the mask, find it’s bounding rectangle and draw it onto output frame
    findContours(procI3, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
    if (contours.size() > 0)
    for (int i = 0; i < (int)contours.size(); i++)
    	rectangle(finI, boundingRect(contours[i]), Scalar(0, 255, 0));


Bounding rectangle hints position of moving object on the scene and could be used to approximate it’s coordinates

Posted on

Tongue tracking

Simek Miroslav

This project is focused on tracking tongue using just the information from plain web camera.  Majority of approaches tried in this project failed including edge detection, morphological reconstruction and point tracking because of various reasons like homogenous and position-variable character of tongue.

The approach that yields usable results is Farneback method of optical flow. By using this method we are able to detect the direction of movement in image and tongue specifically when we use it on image of sole mouth. However mouth area found by haar cascade classifier is very shaky so the key part is to stabilize it.

Functions used: calcOpticalFlowFarneback, CascadeClassifier.detectMultiScale

The process:

  1. Detection of face and mouth using haar cascade classifier where mouth is being searched in the middle of the area between nose and bottom of the face.
    faceCascade.detectMultiScale(frame, faces, 1.1, 3, 0, Size(200, 200), Size(1000, 1000));
    mouthCascade.detectMultiScale(faceMouthAreaImage, possibleMouths, 1.1, 3, 0, Size(50, 20), Size(250, 150));
    noseCascade.detectMultiScale(faceNoseAreaImage, possibleNoses, 1.1, 3, 0, Size(20, 30), Size(150, 250));
  2. Stabilization of mouth area on which optical flow will be used.
    const int movementDistanceThreshold = 40;
    const double movementSpeed = 0.25;
    int xDistance = abs(newMouth.x - mouth.x);
    int yDistance = abs(newMouth.y - mouth.y);
    if (xDistance + yDistance > movementDistanceThreshold)
    	moveMouthRect = true;
    if (moveMouthRect)
    	mouth.x += (int)((double)(newMouth.x - mouth.x) * movementSpeed);
    	mouth.y += (int)((double)(newMouth.y - mouth.y) * movementSpeed);
    if (xDistance + yDistance <= 1.0 / movementSpeed)
    	moveMouthRect = false;
  3. Optical flow (Farneback) of the current and previous stabilized frames from camera.
    cvtColor(img1, in1, COLOR_BGR2GRAY);
    cvtColor(img2, in2, COLOR_BGR2GRAY);
    calcOpticalFlowFarneback(in1, in2, opticalFlow, 0.5, 3, 15, 3, 5, 1.2, 0);


  • Head movements must be minimal to none to work correctly.
  • Actual position of tongue is unknown. What is being tracked is the direction of tongue’s movement in the moment when the tongue moved.



Posted on

Tracking the movement of the lips

Peter Demcak

In this project, we aim to recognize the gestures made by the users by moving their lips; Examples: closed mouth, mouth open, mouth wide open, puckered lips. The challenges in this task are the high homogeneity in the observed area, and the rapidity of lip movements. Our first attempts in detecting said gestures are based on the detection of the lip movements through flow with the Farneback method implemented in OpenCV, or alternatively the calculation of the motion gradient from a silhouette image. It appears, that these methods might not be optimal for the solution of this problem.

OpenCV functions: cvtColor, Sobel, threshold, accumulateWeighted, calcMotionGradient, calcOpticalFlowPyrLK


  1. Detect the position of the largest face in the image using OpenCV cascade classifier. Further steps will be applied using the lower half of the found face.
    faceRects = detect(frame, faceClass);
  2. Transform the image map to HLS color space, and obtain the luminosity map of the image
  3. Combine the results of horizontal and vertical Sobel methods to detect edges of the face features.
    Sobel(hlsChannels[1], sobelVertical, CV_32F, 0, 1, 9);
    Sobel(hlsChannels[1], sobelHorizontal, CV_32F, 1, 0, 9);
    cartToPolar(sobelHorizontal, sobelVertical, sobelMagnitude, sobelAngle, false);
  4. Add accumulative edge detection frame images on top of each other to obtain the silhouette image. To prevent raised  noise in areas without edges, apply a threshold to the Sobel map.
    threshold(sobelMagnitude, sobelMagnitude, norm(sobelMagnitude, NORM_INF)/6, 255, THRESH_TOZERO);
    accumulateWeighted(sobelMagnitude, motionHistoryImage, intensityLoss);
  5. Calculate the flow using the Farneback method implemented in OpenCV using the current and previous frame
    calcOpticalFlowFarneback(prevSobel, sobelMagnitudeCopy, flow, 0.5, 3, 15, 3, 5, 1.2, 0);
Posted on

Tracking people in video with calculating the average speed of the monitored points

This example shows a new method for tracking significant points in video, representing people or moving objects. This method uses several OpenCV functions.

The process

  1. The opening video file
    VideoCapture MojeVideo („cesta k súboru");
  2. Retrieve the next frame (picture)
    Mat FarebnaSnimka;
    MojeVideo >> FarebnaSnimka;
  3. Converting color images to grayscale image
    Mat Snimka1;
    cvtColor(FarebnaSnimka, Snimka1, CV_RGB2GRAY);
  4. Getting significant (well observable) points
    vector<cv::Point2f> VyznacneBody;
    goodFeaturesToTrack(Snimka1, VyznacneBody, 300, 0.06, 0);
  5. Getting the next frame and its conversion
  6. Finding significant points from the previous frame to the next
    vector<cv::Point2f> PosunuteBody;
    vector<uchar> PlatneBody;
    calcOpticalFlowPyrLK(Snimka1, Snimka2, VyznacneBody, PosunuteBody, PlatneBody, err);
  7. Calculation of the velocity vector for each significant point
  8. Clustering of significant points according to their average velocity vectors
  9. Visualization
    1. Assign a color to cluster
    2. Plotting points on a slide
    3. Plotting arrows at the center points of clusters – average of the average velocity vectors
  10. Dumping the clusters and other places for the classification of points into them (to preserve the color of the cluster) + eventual creation of new clusters
  11. Landmarks declining over time – the time when they need to re-designate


  • This method is faster than OpenCV method for detecting people.
  • It also works when only part of person is visible, position is unusual or person is rotated.
  • Person is divided to parts.
  • It does not distinguish between persons or other moving objects.
Posted on

Google Street View Video


The goal of this project is to create a program that will be able to stitch a sequence of images fromgoogle street-view and make movie from it. The idea came to my mind, when I needed to check thecrossroads and traffic signals along the route I’ve never driven before. The method was tointerpolatefew more images between two consecutive views to simulate moving car. To do that Ihad to do following steps:


  1. Remove UI elements from images
  2. Find homography between following images
  3. Interpolate homography between them
  4. Put images into movie

Removing UI elements from images

Removing UI elements is important because in later steps I will need to find similar areas and those elements can spoil the match-up. First I cycled through all images and gained areas with same color in black. The resulting image was accumulated from all the differences. Black areas represent pixels that were same in all images. To improve the mask I inverted the image did some thresholding, Gaussian blur and again thresholding. Result was mask used for inpaint method to fill in those regions without UI elements with color.

Example of process

Finding homography

Homography found between two images was found using SURF detector. I improved the detection by using mask of similar areas as in previous step. I did it because many key-points were detected on sky or far objects and results were generally worse. Last step was to interpolate from one image to another using homography. In my program I used 25 steps between two pictures. Those pictures were stacked into movie and saved.

Mat homo = findMatch(pic1, pic2);
bj_corners[0] = cvPoint(0,0)
////-- Get the corners from the image_1 ( the object to be "detected" )
vector<Point2f> obj_corners(4);
obj_corners[0] = cvPoint(0,0);
obj_corners[1] = cvPoint( pic2.cols, 0 );
obj_corners[2] = cvPoint( pic2.cols, pic2.rows );
obj_corners[3] = cvPoint( 0, pic2.rows );

vector<Point2f> corners(4);
vector<Point2f> inter_corners(4);
perspectiveTransform( obj_corners, corners, homo);
for(int i=0; i<4; i++) {
 inter_corners[i].x = corners[i].x - distance[i].x*j;
 inter_corners[i].y = corners[i].y - distance[i].y*j;
Mat interHomo = findHomography( corners, inter_corners, 0 );
Mat transformed;
warpPerspective(pic1, transformed, interHomo, Size(600,350));
result[j] = transformed;
Posted on

Moving Vehicle Detection

The goal of this project is to implement algorithm that segments foreground using OpenCV library. We assume that background is static, objects in foreground are moving and video is taken from static camera. We detect moving  vehicles (foreground) with 2 methods.

Background detection

First method is computing an average image from video frames.

  1. Each frame is added into accumulator with a certain small weight (0.05 and smaller)
    1. accumulateWeighted(frame, background, alpha);
    2. at this point we have in Mat background actual backgorund image
  2. To detect foreground we have to compute difference between current frame and current accumulated background image
    1. absdiff(frame, background, diff);
    2. in Mat diff is color image, we need to transform it into grayscale image
  3. To detect relevant changes (more than given threshold) we use simple
    1. thresholdthreshold(diff, foregroundMask, 20, 255, CV_THRESH_BINARY);
    2. in Mat foregroundMask is foreground mask
  4. We can use morphological operations (dilatation, erosion) to expand foreground region

Each steps (1-4) are illustrated in next figures

Frames history

Second method is compare current frame with older frame. It is enough if the stack is 5-20 elements large- it depends on the speed of vehicles.

  1. Add current frame into stack (if stack is full, first element is erased and the rest is shifted)
    1. framesHistory.add(frame);
  2. Compute difference between current frame and first element in the stack
    1. first = framesHistory.first();
    2. absdiff(frame, first, diff);
  3. To detect relevant changes (more than given threshold) we use simple threshold
    1. threshold(diff, foregroundMask, 20, 255, CV_THRESH_BINARY);
    2. in Mat foregroundMask is foreground mask
  4. We can use morphological operations (dilatation, erosion) to expand foreground region

Each steps (1-4) are illustrated in next figures. LocalBackgound is the older frame (5 frames old)

Combination of methods

These 2 methods are combined to create more accurate output. In the next picture we can see that first way (computing average image) creates “tails” behind vehicles. Comparing with older frame doesn’t create „tails“.

We use simple sum of binary masks

Mat sumMask = mask1 & mask2

How to create more precise segmentation, future work

To create more precise vehicle segmentation we have to use another methods. Shadows of cars, lights of cars make these 2 methods to hard use. We can only segment region where could be a car, but for more precisely detection we have to use template matching (champfer matching), graph cut method. In this project we experimented with these other two, but these were too complex, so the time complexity was unacceptable to use in video.