How to change architecture of conditional GAN to generate 224x224x3 images?

Question

Alok 2022 年 8 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1773905-how-to-change-architecture-of-conditional-gan-to-generate-224x224x3-images

回答済み: Ayush Aniket 2025 年 5 月 9 日

I am following matlab example on conditional GAN at https://www.mathworks.com/help/deeplearning/ug/train-conditional-generative-adversarial-network.html

This example is for image size 64x64x3. I am wondering what changes should be done in layersGenerator and layersDiscriminator to generate 224x224x3 images.

This is my code:

inputSize = [224 224 3] or [256 256 3];

Note if Factor=2 (below) then I get image size 128x128x3. If Factor=4 then generated image size if 256x256x3. However, during the loop, it gives an error that trainedVariance is negative.

inputSize = [64 64 3];
Factor = 4; %if Factor =2 then 128x128x3 image size is generated; 
inputSize = Factor*inputSize(1:2);
numClasses = 2;
augimds = augmentedImageDatastore(inputSize(1:2),XTrain,YTrain);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),XValidation,YValidation);
numLatentInputs = 100;%100
embeddingDimension = 50;
numFilters = Factor*64;%224;
filterSize = 5;
projectionSize = Factor*[4 4 1024];
layersGenerator = [
    featureInputLayer(numLatentInputs)
    fullyConnectedLayer(prod(projectionSize))
    functionLayer(@(X) feature2image(X,projectionSize),Formattable=true)
    concatenationLayer(3,2,Name="cat");
    transposedConv2dLayer(filterSize,4*numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,2*numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,3,Stride=2,Cropping="same")
    tanhLayer];
lgraphGenerator = layerGraph(layersGenerator);
layers = [
    featureInputLayer(1)
    embeddingLayer(embeddingDimension,numClasses)
    fullyConnectedLayer(prod(projectionSize(1:2)))
    functionLayer(@(X) feature2image(X,[projectionSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphGenerator = addLayers(lgraphGenerator,layers);
lgraphGenerator = connectLayers(lgraphGenerator,"emb_reshape","cat/in2");
netG = dlnetwork(lgraphGenerator);
dropoutProb = 0.75;
%numFilters = 64;
scale = 0.2;
filterSize = 5;
layersDiscriminator = [
    imageInputLayer(inputSize,Normalization="none")
    dropoutLayer(dropoutProb)
    concatenationLayer(3,2,Name="cat")
    convolution2dLayer(filterSize,numFilters,Stride=2,Padding="same")
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,2*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,4*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,8*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(Factor*4,1)];
lgraphDiscriminator = layerGraph(layersDiscriminator);
layers = [
    featureInputLayer(1)
    embeddingLayer(embeddingDimension,numClasses)
    fullyConnectedLayer(prod(inputSize(1:2)))
    functionLayer(@(X) feature2image(X,[inputSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphDiscriminator = addLayers(lgraphDiscriminator,layers);
lgraphDiscriminator = connectLayers(lgraphDiscriminator,"emb_reshape","cat/in2");
netD = dlnetwork(lgraphDiscriminator);

However, the above code gives an error at

[~,~,gradientsG,gradientsD,stateG,scoreG,scoreD] = ...
            dlfeval(@modelLoss2,netG,netD,X,T,Z,flipFactor);

The size of generated image at

[XGenerated,stageG] = forward(netG,Z,T);

is 256x256x3. However, an error comes stating that trainedVariance is not positive

Could you assist me which transposedConv2dLayer to change to adjust the size to 224x224x3 or 256x256x3?

Thanks for your help

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ayush Aniket 2025 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1773905-how-to-change-architecture-of-conditional-gan-to-generate-224x224x3-images#answer_1564956

If you use a projection size that doesn't align with the upsampling path, the generator output won't match the expected image size, which can cause downstream errors (such as negative variance or shape mismatches).

Each transposedConv2dLayer with Stride=2 doubles the spatial resolution. The number of upsampling layers and the initial projection size must align so that after all upsampling, you reach your desired output size.The general rule is that if your initial projection size is [h, w, c] and you have n upsampling layers (each with Stride=2), your output size will be [h*2^n, w*2^n, outputChannels]. Therefore, for

1. 256x256x3 Output -

Start with: [4, 4, ...] projection size
Number of upsampling layers: 4
Calculation: 4 → 8 → 16 → 32 → 64 → 128 → 256 (for 6 layers, but typically 4 layers from 4 to 64, then up to 256)
But: 4 upsampling layers from [4,4] gives [64,64]`\, so you need 6 layers to go from 4 to 256.
However, your code uses 4 upsampling layers, so your projection should be [16,16, ...] for 256x256 output: 16 → 32 → 64 → 128 → 256 (4 layers, 16*2^4 = 256)

2. 224x224x3 Output -

224 is not a power of 2, so you need to start with a projection size that, after upsampling, results in 224.
224 = 14 * 2^4
So, start with [14,14, ...] and 4 upsampling layers.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to change architecture of conditional GAN to generate 224x224x3 images?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How to change architecture of conditional GAN to generate 224x224x3 images?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示