Tomas Drutarovsky
We implement well-known Bag of Words algorithm (BoW) in order to perform image classification of tiger cat images. In the work, we use a subset of publicly available ImageNet dataset and divide data on two sets – tiger cats and non-cat objects, which consist of images of 10 random chosen object types.
The main processing algorithm is performed by these steps:
- Choose a suitable subset of images from a large dataset
- We use around 100 000 unique images
- Detect keypoints
- We detect keypoints using SIFT or Dense keypoint extractor
DenseFeatureDetector dense(20.0f, 3, 2, 10, 4); BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags); for (int i = 0; i < list.count(); i++){ Mat img = imread(list.at(i), CV_LOAD_IMAGE_COLOR); dense.detect(img, keypoints); }
- Describe keypoints using SIFT
- SIFT descriptor produces description for each keypoint separately
sift.compute(img, keypoints, descriptor); bowTrainer.add(descriptor);
- SIFT descriptor produces description for each keypoint separately
- Cluster descriptors using k-means
- Around 10 million of keypoints are chosen to cluster
- Clustering results in 1000 clusters represented by centroids (visual words)
Mat vocabulary = bowTrainer.cluster();
- Calculate BoW descriptors
- Each keypoint from an input image is then evaluated for response from 1000 visual words or represents
- Histogram of reponse is normalized for each image
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher); Ptr<FeatureDetector> detector(new SiftFeatureDetector()); BOWImgDescriptorExtractor bowExtractor(detector, matcher); bowExtractor.compute(img, keypoints, descriptor);
- Train SVM using BoW descriptors
- Calculated histograms or BoW descriptors are trained using linear SVM
- Suitable rate between positive and negative subset needs to be chosen
- Test images using SVM
- Response of test images is used to evaluate algorithm
- Our model shows accuracy of (62% of positive set and 58% of negative set)
- Better results are achievable using larger datasets, but both time and computational power are necessary