Posted on

Tongue tracking

Simek Miroslav

This project is focused on tracking tongue using just the information from plain web camera.  Majority of approaches tried in this project failed including edge detection, morphological reconstruction and point tracking because of various reasons like homogenous and position-variable character of tongue.

The approach that yields usable results is Farneback method of optical flow. By using this method we are able to detect the direction of movement in image and tongue specifically when we use it on image of sole mouth. However mouth area found by haar cascade classifier is very shaky so the key part is to stabilize it.

Functions used: calcOpticalFlowFarneback, CascadeClassifier.detectMultiScale

The process:

  1. Detection of face and mouth using haar cascade classifier where mouth is being searched in the middle of the area between nose and bottom of the face.
    faceCascade.detectMultiScale(frame, faces, 1.1, 3, 0, Size(200, 200), Size(1000, 1000));
    mouthCascade.detectMultiScale(faceMouthAreaImage, possibleMouths, 1.1, 3, 0, Size(50, 20), Size(250, 150));
    noseCascade.detectMultiScale(faceNoseAreaImage, possibleNoses, 1.1, 3, 0, Size(20, 30), Size(150, 250));
  2. Stabilization of mouth area on which optical flow will be used.
    const int movementDistanceThreshold = 40;
    const double movementSpeed = 0.25;
    int xDistance = abs(newMouth.x - mouth.x);
    int yDistance = abs(newMouth.y - mouth.y);
    if (xDistance + yDistance > movementDistanceThreshold)
    	moveMouthRect = true;
    if (moveMouthRect)
    	mouth.x += (int)((double)(newMouth.x - mouth.x) * movementSpeed);
    	mouth.y += (int)((double)(newMouth.y - mouth.y) * movementSpeed);
    if (xDistance + yDistance <= 1.0 / movementSpeed)
    	moveMouthRect = false;
  3. Optical flow (Farneback) of the current and previous stabilized frames from camera.
    cvtColor(img1, in1, COLOR_BGR2GRAY);
    cvtColor(img2, in2, COLOR_BGR2GRAY);
    calcOpticalFlowFarneback(in1, in2, opticalFlow, 0.5, 3, 15, 3, 5, 1.2, 0);


  • Head movements must be minimal to none to work correctly.
  • Actual position of tongue is unknown. What is being tracked is the direction of tongue’s movement in the moment when the tongue moved.