How much GPU do I need?

Question

0 投票

Greetings,

I am trying to train an FCDD anomaly detector using inception V3 as the backbone network.

When I change the image size above 540 960 3, I get a "GPU ran out of memory" error.

How can I know how much GPU I need?

In deep learning AI training, what characteristics of the training process effect how "much" gpu is needed?

Is that the image size and the minibatch size?

For a given image size [x y z] jpeg, and minibatch size z, can I calculate, with descriptive analysis, the needed gpu to train a network as described in my first sentence of this question post?

Thank you,

Matlab deep learning enthusiast.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Mrutyunjaya Hiremath 2023 年 9 月 1 日

1 投票

Basic Formula to Estimate GPU Memory Requirement

Memory Required=Model Size+Batch Size×(Forward Pass Memory+Backward Pass Memory)

Model Size: Memory to store the model weights. If the model has N parameters and each parameter is of size S bytes (usually 4 bytes for float32), then the model size is N×S.
Forward Pass Memory: Memory to store the intermediate activations during a forward pass. This depends on the model architecture and input size.
Backward Pass Memory: Memory to store gradients during backpropagation. This is roughly equal to the forward pass memory.
Batch Size: Number of samples processed in parallel.

Let's take an example with hypothetical values for an Inception V3 model to illustrate:

Model Parameters (Inception V3): Approx 21.8M
Data Type: float32 (4 bytes)
Input Size: 540x960x3
Batch Size: 32

For simplicity, let's assume that the forward and backward pass each require memory roughly equal to the input size times the number of feature maps at each layer.

Model Size = 21.8M parameters × 4 bytes/parameter = 87.2 MB

Forward Pass Memory = Batch Size × Input Size × Feature Maps × 4 bytes

Backward Pass Memory ≈ Forward Pass Memory

Assuming that the feature maps are roughly the same size as the input image (another gross simplification), and that there are about 1000 feature maps (across all layers):

Forward Pass Memory = 32 × 540 × 960 × 3 × 1000 × 4 bytes ≈ 197.6 GB

Total Memory = Model Size + Forward Pass Memory + Backward Pass Memory = 87.2 MB + 2 × 197.6 GB ≈ 395.2 GB

This is a very crude estimate and the actual memory requirement will likely be different due to various optimizations that deep learning frameworks employ.

So, to directly answer your question: you would likely need a GPU with a lot more than 400GB of memory with the current setup, which is currently infeasible. You would have to make adjustments to your model, data, or training regime to fit it into a GPU that you can realistically acquire.

Solutions for Memory Errors

Reduce Batch Size: The easiest way to reduce memory usage.
Use Gradient Accumulation: Perform a backward pass after accumulating gradients over multiple smaller batches.
Use Mixed Precision Training: Utilizes both float16 and float32 to make training more memory-efficient.
Use a Simpler Model: Smaller architectures require less memory.
Distributed Training: Split the model and data across multiple GPUs.
Check for Memory Leaks: Make sure that you're not unintentionally holding onto tensors that you no longer need.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Alex Taylor 2023 年 9 月 1 日

0 投票

To the above answer I would add:

1) I'm assuming that you are using pretrainedEncoderNetwork or another method to "cut" the inceptionV3 at a given point. So, the parameter count in the network needs to be based on the set of Learnables in the backbone, not the full inceptionV3 network.

2) In practice, rather than this kind of analytic memory analysis, which can be complicated and requires some implementation details about how ops are implemented for forward/backward and how memory is cached in forward/backward, it's often just easier to start with some small input size spatially, say 224x224x3 for example, and a low batch size. Then increase either batch size or spatial dimensions one at a time to see how GPU RAM scales with changing.

3) Using a smaller backbone like a mobilenet type architecture, or using lower batch size or lower spatial dims will be ways of reducing peak memory consumption during training.

4) FCDD in particular often works well by dividing your training data into patches. Since it is a fully convolutional architecture, you can train on smaller patches at a given scale and then do full sized inference at the same scale as a way of working within memory limitations at training time but not being tied to the same spatial dimension size during inference. To do this you will need to obtain a repsentative set of good patches/tiles and a small set of bad patches/tiles from your full sized images.

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示

Alex Taylor 2023 年 10 月 11 日

1) A typical pattern for using a backbone with FCDD is to choose a CNN backbone where you choose output activations just before the spatial dimensions are downsampled (e.g. by a maxpool stride 2 operation commonly).

pretrainedEncoderNetwork(networkName,depth) does this automatically for you. The second input, depth, specifies how many downsampling operations are performed on the input data. You will see if you increase depth that you'll end up with more Layers and more Learnables in the output dlnetwork accordingly.

My point to the other reply was that you aren't using the full set of Learnables in inceptionv3 because you are using a truncated form of the backbone as the feature extractor, so if you were to rigorously calculate memory use you'd need to account for that.

3) Hard to say without experimentation. But choosing the lightest weight backbone that still gives you acceptable detection metrics as far as precision/recall/accuracy is a good practice for inference speed + memory detection.

4) Yes patching is more complicated and requires more detailed labeling of your data in practice because you need to know which regions of your input training images have defects and which don't.

An alternative since R2023a is we do have two new anomaly detectors in the Visual Inspection Library:

https://www.mathworks.com/help/vision/ref/fastflowanomalydetector.html

https://www.mathworks.com/help/vision/ref/patchcoreanomalydetector.html

These detectors have typically have extremely good detection metrics but are heavier weight than FCDD on the memory use side, so maybe not what you're going for but potentially worth checking out. They both have the advantage of being trained on strictly good/normal data so that can be more convenient in some uses than FCDD, and they both choose reasonable backbones for you by default.

Alex Taylor 2023 年 10 月 17 日

I generally use the CUDA tool nvidia-smi:

https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf

To monitor GPU memory use. You can have it run in a loop with the -l option.

William 2023 年 10 月 19 日

編集済み: William 2023 年 10 月 19 日

Hey Alex,

I attempted to do what you suggested in number 2).

So I decided to just kind of watch the performance tab in windows task manager.

I watched where it said "Dedicated GPU memory".

Is that Ok?

I tested 2 image sizes. I did these 2:

[270 480 3]
[540 960 3]

minibatch size was 7 for both.

Data and everything else all training parameters and everything was the same for both.

Only thing different was the resizing of the images.

Before each test, I did a gpuDevice(1) in the command window, to clear the GPU.

Also, I went ahead and closed matlab, turned off, and restarted the computer.

I did this to make sure I had a perfect fresh start each time.

For [270 480 3], starting GPU before running the training section of the code was 1.3 gig

During training, for size [270 480 3], the peak dedicated GPU appeared to be 4.5 gig

For [540 960 3], starting GPU before running the training section of the code was 1.8 gig

During training, for size [540 960 3], the peak dedicated GPU appeared to be 7.8 gig

I have a total available dedicated GPU of 8 gigs, (So that probably explains why I get GPU out of memory errors at training when I try to go to images above the size of [540 960 3] )

[540 960 3] is an image roughly twice the size of [270 480 3]

it sort of looks like it used twice the gpu to, which might be a conincidence.

Anyway, the real image size I'd like to be able to train on is [2160 3840 3]

All images are jpg.

[2160 3840 3] is exactly 4 times times bigger image than [540 960 3]

Would this mean I need 4 times the GPU?

At least 32 gigs of dedicated GPU?

thank you,

Wade

サインインしてコメントする。

How much GPU do I need?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (2 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

How much GPU do I need?

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (2 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

7 件のコメント 5 件の古いコメントを表示 5 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

7 件のコメント
5 件の古いコメントを表示 5 件の古いコメントを非表示