Bag of Words algorithm – Vision & Graphics Group

Tomas Drutarovsky

We implement well-known Bag of Words algorithm (BoW) in order to perform image classification of tiger cat images. In the work, we use a subset of publicly available ImageNet dataset and divide data on two sets â€“ tiger cats and non-cat objects, which consist of images of 10 random chosen object types.

The main processing algorithm is performed by these steps:

Choose a suitable subset of images from a large dataset
- We use around 100 000 unique images

Detect keypoints

We detect keypoints using SIFT or Dense keypoint extractor

DenseFeatureDetector dense(20.0f, 3, 2, 10, 4);
BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);

for (int i = 0; i < list.count(); i++){
	Mat img = imread(list.at(i), CV_LOAD_IMAGE_COLOR);

	dense.detect(img, keypoints);
}

drutarovsky_keypoints — Keypoints detected using SIFT detect function – more than 500 keypoints.

Describe keypoints using SIFT
- SIFT descriptor produces description for each keypoint separately
```
sift.compute(img, keypoints, descriptor);
bowTrainer.add(descriptor);
```
Cluster descriptors using k-means
- Around 10 million of keypoints are chosen to cluster
- Clustering results in 1000 clusters represented by centroids (visual words)
```
Mat vocabulary = bowTrainer.cluster();
```
Calculate BoW descriptors
- Each keypoint from an input image is then evaluated for response from 1000 visual words or represents
- Histogram of reponse is normalized for each image
```
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
Ptr<FeatureDetector> detector(new SiftFeatureDetector());
BOWImgDescriptorExtractor bowExtractor(detector, matcher);
bowExtractor.compute(img, keypoints, descriptor);
```
BoW descriptor of 200 ats visualized over 1000 clustered visual words vocabulary
Train SVM using BoW descriptors
- Calculated histograms or BoW descriptors are trained using linear SVM
- Suitable rate between positive and negative subset needs to be chosen
Test images using SVM
- Response of test images is used to evaluate algorithm
- Our model shows accuracy of (62% of positive set and 58% of negative set)
- Better results are achievable using larger datasets, but both time and computational power are necessary