カスタマイズした bag of features を使用した画像検索

この例では、カスタマイズされた bag of features ワークフローを使用してコンテンツベースの画像検索 (Content Based Image Retrieval: CBIR) システムを作成する方法を説明します。

はじめに

コンテンツベースの画像検索 (CBIR) システムは、視覚的にクエリイメージと類似したイメージの検索に使用されます。CBIR システムは、Web ベースの商品探索、監視、場所の視覚判定など、多くの分野でその応用が見られます。CBIR システムを実装する一般的な手法として bag of visual words があり、これは bag of features [1,2] とも呼ばれています。bag of features は、文書検索分野の手法を画像検索に応用したものです。文書検索では実際の単語を使いますが、bag of features では、イメージを記述するビジュアルワードとしてイメージ特徴を使用します。

イメージ特徴は CBIR システムの重要な部分です。こうしたイメージ特徴はイメージ間の類似性の判断に使用され、色、テクスチャおよび形状などのグローバルなイメージ特徴を含めることができます。また、Speeded Up Robust Features (SURF)、勾配ヒストグラム (HOG) またはローカルバイナリパターン (LBP) などの局所的なイメージ特徴もあります。bag of features 手法の利点は、ビジュアルワードボキャブラリの作成に使用される特徴のタイプをアプリケーションに合わせてカスタマイズできることです。

CBIR システムではイメージ探索の速度と効率も重要です。たとえば、100 より少ないイメージの小規模なイメージコレクションでは、クエリイメージからの特徴を、コレクションにある各イメージの特徴と比較する力まかせ探索を実行しても問題ないかもしれません。より大規模なコレクションでは力まかせ探索が不可能になり、より効率的な探索手法を使用しなければなりません。bag of features は、ビジュアルワードヒストグラムの疎集合を使用して大規模なイメージコレクションを表現するための簡潔なエンコードスキームを提供します。これにより、転置インデックスデータ構造を使用したコンパクトなストレージと効率的な探索が可能になります。

Computer Vision Toolbox™ には、画像検索システムを実装するためのカスタマイズ可能な bag of features フレームワークが備わっています。以下に手順の概要を示します。

検索対象とするイメージ特徴の選択
bag of features の作成
イメージのインデックス付け
類似したイメージの探索

この例では上記の手順を実行し、Flower Dataset [3] を探索する画像検索システムを作成します。このデータセットには 5 種の花のイメージが約 3670 個含まれています。

この後の手順で使用できるよう、このデータセットをダウンロードします。

% Location of the compressed data set
url = 'http://download.tensorflow.org/example_images/flower_photos.tgz';

% Store the output in a temporary folder
downloadFolder = tempdir;
filename = fullfile(downloadFolder,'flower_dataset.tgz');

Web サイトからデータセットをダウンロードする際、インターネット接続の速度によってはかなり長時間を要する場合があることに注意してください。下記のコマンドはその間 MATLAB をブロックします。別の方法として、Web ブラウザーを使用して、このセットをローカルディスクにまずダウンロードしておくことができます。このやり方を選ぶ場合、上記の変数 'url' を、ダウンロードしたファイルを指すように再設定してください。

% Uncompressed data set
imageFolder = fullfile(downloadFolder,'flower_photos');

if ~exist(imageFolder,'dir') % download only once
    disp('Downloading Flower Dataset (218 MB)...');
    websave(filename,url);
    untar(filename,downloadFolder)
end

flowerImageSet = imageDatastore(imageFolder,'LabelSource','foldernames','IncludeSubfolders',true);

% Total number of images in the data set
numel(flowerImageSet.Files)

ans = 3670

手順 1 - 検索対象とするイメージ特徴の選択

検索に使用される特徴のタイプは、コレクション内のイメージのタイプによって決まります。たとえば、シーン (海辺、都市部、ハイウェイなど) で構成されるイメージコレクションを探索する場合、シーン全体の純色量を取得するカラーヒストグラムなどのグローバルなイメージ特徴を使用することが望まれます。これに対し、イメージコレクションにある特定オブジェクトの検索が目的である場合、オブジェクトのキーポイント周辺で抽出された局所的なイメージ特徴を選ぶ方がより適しています。

問題への取り組み方法について見当をつけるため、イメージの 1 つを見ることから始めます。

% Display a one of the flower images
figure
I = imread(flowerImageSet.Files{1});
imshow(I);

表示されるイメージは、Mario によるものです。

この例の目的は、クエリイメージ内のカラー情報を使用して、データセットにある類似した花を探索することです。色の空間配置に基づく単純なイメージ特徴から始めるのが適切です。

次の関数は、指定のイメージから色の特徴を抽出するために使用されるアルゴリズムを記述したものです。この関数を bagOfFeatures 内のextractorFcnとして使用して、色の特徴を抽出します。

type exampleBagOfFeaturesColorExtractor.m

function [features, metrics] = exampleBagOfFeaturesColorExtractor(I) 
% Example color layout feature extractor. Designed for use with bagOfFeatures.
%
% Local color layout features are extracted from truecolor image, I and
% returned in features. The strength of the features are returned in
% metrics.

%   Copyright 2014-2020 The MathWorks, Inc.

[~,~,P] = size(I);

isColorImage = P == 3; 

if isColorImage
    
    % Convert RGB images to the L*a*b* colorspace. The L*a*b* colorspace
    % enables you to easily quantify the visual differences between colors.
    % Visually similar colors in the L*a*b* colorspace will have small
    % differences in their L*a*b* values.
    Ilab = rgb2lab(I);                                                                             
      
    % Compute the "average" L*a*b* color within 16-by-16 pixel blocks. The
    % average value is used as the color portion of the image feature. An
    % efficient method to approximate this averaging procedure over
    % 16-by-16 pixel blocks is to reduce the size of the image by a factor
    % of 16 using IMRESIZE. 
    Ilab = imresize(Ilab, 1/16);
    
    % Note, the average pixel value in a block can also be computed using
    % standard block processing or integral images.
    
    % Reshape L*a*b* image into "number of features"-by-3 matrix.
    [Mr,Nr,~] = size(Ilab);    
    colorFeatures = reshape(Ilab, Mr*Nr, []); 
           
    % L2 normalize color features
    rowNorm = sqrt(sum(colorFeatures.^2,2));
    colorFeatures = bsxfun(@rdivide, colorFeatures, rowNorm + eps);
        
    % Augment the color feature by appending the [x y] location within the
    % image from which the color feature was extracted. This technique is
    % known as spatial augmentation. Spatial augmentation incorporates the
    % spatial layout of the features within an image as part of the
    % extracted feature vectors. Therefore, for two images to have similar
    % color features, the color and spatial distribution of color must be
    % similar.
    
    % Normalize pixel coordinates to handle different image sizes.
    xnorm = linspace(-0.5, 0.5, Nr);      
    ynorm = linspace(-0.5, 0.5, Mr);    
    [x, y] = meshgrid(xnorm, ynorm);
    
    % Concatenate the spatial locations and color features.
    features = [colorFeatures y(:) x(:)];
    
    % Use color variance as feature metric.
    metrics  = var(colorFeatures(:,1:3),0,2);
else
    
    % Return empty features for non-color images. These features are
    % ignored by bagOfFeatures.
    features = zeros(0,5);
    metrics  = zeros(0,1);     
end

手順 2 - bag of features の作成

特徴のタイプを定義したら、次の手順は、一連の学習用イメージを使用した bagOfFeatures 内のビジュアルボキャブラリの学習です。以下に示すコードはデータセットからイメージのランダムなサブセットを学習用に選択し、'CustomExtractor' オプションを使って bagOfFeatures に学習を行わせます。

doTraining を false にセットし、事前学習済みの bagOfFeatures を読み込みます。doTraining を false にセットするのは、学習プロセスには数分かかるためです。例の残りの部分では、時間を節約するために事前学習済みの bagOfFeatures を使用します。colorBag をローカルで再作成する場合は、doTraining を true にセットしてComputer Vision Toolbox の基本設定して処理時間を短縮することを検討してください。

doTraining = false;

if doTraining
    %Pick a random subset of the flower images.
    trainingSet = splitEachLabel(flowerImageSet, 0.6, 'randomized');
    
    % Specify the number of levels and branching factor of the vocabulary
    % tree used within bagOfFeatures. Empirical analysis is required to
    % choose optimal values.
    numLevels = 1;
    numBranches = 5000;
    
    % Create a custom bag of features using the 'CustomExtractor' option.
    colorBag = bagOfFeatures(trainingSet, ...
        'CustomExtractor', @exampleBagOfFeaturesColorExtractor, ...
        'TreeProperties', [numLevels numBranches]);
else
    % Load a pretrained bagOfFeatures.
    load('savedColorBagOfFeatures.mat','colorBag');
end

手順 3 - イメージのインデックス付け

bagOfFeatures が作成されたので、花のイメージセット全体に探索用のインデックスを付けることができます。インデックス付けの手続きでは、手順 1 のカスタム抽出器関数を使用して各イメージから特徴を抽出します。抽出した特徴はビジュアルワードのヒストグラムへとエンコードされ、イメージインデックスに追加されます。

if doTraining
    % Create a search index.
    flowerImageIndex = indexImages(flowerImageSet,colorBag,'SaveFeatureLocations',false);
else
    % Load a saved index
    load('savedColorBagOfFeatures.mat','flowerImageIndex');
end

インデックス付けの手順では何千というイメージが処理されるので、この例の残りの部分では保存されたインデックスを使用して時間を節約します。doTraining を true にセットすると、インデックスをローカルで再作成できます。

手順 4 - 類似したイメージの探索

最後の手順では、関数 retrieveImages を使用して類似したイメージを探索します。

% Define a query image
queryImage = readimage(flowerImageSet,200);

figure
imshow(queryImage)

表示されるイメージは、RetinaFunk によるものです。

% Search for the top 5 images with similar color content
[imageIDs, scores] = retrieveImages(queryImage, flowerImageIndex,'NumResults',5);

retrieveImages は各結果のイメージ ID とスコアを返します。スコアは最高値から最低値へと並べられます。

scores

scores = 5×1

    0.4776
    0.2138
    0.1386
    0.1382
    0.1317

imageIDs は、イメージセット内にある、クエリイメージに似ているイメージに対応します。

% Display results using montage. 
figure
montage(flowerImageSet.Files(imageIDs),'ThumbnailSize',[200 200])

表示されるイメージは RetinaFunk、Jenny Downing、Mayeesherr、daBinsi、Steve Snodgrass によるものです。

まとめ

この例では、bagOfFeatures をカスタマイズする方法と、indexImages および retrieveImages を使用して色の特徴に基づく画像検索システムを作成する方法を説明しました。ここで示した手法は、bagOfFeatures 内で使用される特徴をカスタマイズすることによって他の特徴タイプにも応用できます。

参考文献

[1] Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV. (2003) 1470-1477

[2] Philbin, J., Chum, O., Isard, M., A., J.S., Zisserman: Object retrieval with large vocabularies and fast spatial matching. In: CVPR. (2007)

[3] TensorFlow: How to Retrain an Image Classifier for New Categories.