Posted on

Bag of Words algorithm

Tomas Drutarovsky

We implement well-known Bag of Words algorithm (BoW) in order to perform image classification of tiger cat images. In the work, we use a subset of publicly available ImageNet dataset and divide data on two sets – tiger cats and non-cat objects, which consist of images of 10 random chosen object types.

The main processing algorithm is performed by these steps:

  1. Choose a suitable subset of images from a large dataset
    • We use around 100 000 unique images
  2. Detect keypoints
    • We detect keypoints using SIFT or Dense keypoint extractor
    DenseFeatureDetector dense(20.0f, 3, 2, 10, 4);
    BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
    for (int i = 0; i < list.count(); i++){
    	Mat img = imread(, CV_LOAD_IMAGE_COLOR);
    	dense.detect(img, keypoints);

    Keypoints detected using SIFT detect function – more than 500 keypoints.
  3. Describe keypoints using SIFT
    • SIFT descriptor produces description for each keypoint separately
      sift.compute(img, keypoints, descriptor);
  4. Cluster descriptors using k-means
    • Around 10 million of keypoints are chosen to cluster
    • Clustering results in 1000 clusters represented by centroids (visual words)
    Mat vocabulary = bowTrainer.cluster();
  5. Calculate BoW descriptors
    • Each keypoint from an input image is then evaluated for response from 1000 visual words or represents
    • Histogram of reponse is normalized for each image
    Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
    Ptr<FeatureDetector> detector(new SiftFeatureDetector());
    BOWImgDescriptorExtractor bowExtractor(detector, matcher);
    bowExtractor.compute(img, keypoints, descriptor);

    BoW descriptor of 200 ats visualized over 1000 clustered visual words vocabulary
  6. Train SVM using BoW descriptors
    • Calculated histograms or BoW descriptors are trained using linear SVM
    • Suitable rate between positive and negative subset needs to be chosen
  7. Test images using SVM
    • Response of test images is used to evaluate algorithm
    • Our model shows accuracy of (62% of positive set and 58% of negative set)
    • Better results are achievable using larger datasets, but both time and computational power are necessary