How much GPU do I need?

Greetings,
I am trying to train an FCDD anomaly detector using inception V3 as the backbone network.
When I change the image size above 540 960 3, I get a "GPU ran out of memory" error.
How can I know how much GPU I need?
In deep learning AI training, what characteristics of the training process effect how "much" gpu is needed?
Is that the image size and the minibatch size?
For a given image size [x y z] jpeg, and minibatch size z, can I calculate, with descriptive analysis, the needed gpu to train a network as described in my first sentence of this question post?
Thank you,
Matlab deep learning enthusiast.

回答 (2 件)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath 2023 年 9 月 1 日

1 投票

Basic Formula to Estimate GPU Memory Requirement
Memory Required=Model Size+Batch Size×(Forward Pass Memory+Backward Pass Memory)
  1. Model Size: Memory to store the model weights. If the model has N parameters and each parameter is of size S bytes (usually 4 bytes for float32), then the model size is N×S.
  2. Forward Pass Memory: Memory to store the intermediate activations during a forward pass. This depends on the model architecture and input size.
  3. Backward Pass Memory: Memory to store gradients during backpropagation. This is roughly equal to the forward pass memory.
  4. Batch Size: Number of samples processed in parallel.
Let's take an example with hypothetical values for an Inception V3 model to illustrate:
  • Model Parameters (Inception V3): Approx 21.8M
  • Data Type: float32 (4 bytes)
  • Input Size: 540x960x3
  • Batch Size: 32
For simplicity, let's assume that the forward and backward pass each require memory roughly equal to the input size times the number of feature maps at each layer.
Model Size = 21.8M parameters × 4 bytes/parameter = 87.2 MB
Forward Pass Memory = Batch Size × Input Size × Feature Maps × 4 bytes
Backward Pass Memory ≈ Forward Pass Memory
Assuming that the feature maps are roughly the same size as the input image (another gross simplification), and that there are about 1000 feature maps (across all layers):
Forward Pass Memory = 32 × 540 × 960 × 3 × 1000 × 4 bytes ≈ 197.6 GB
Total Memory = Model Size + Forward Pass Memory + Backward Pass Memory = 87.2 MB + 2 × 197.6 GB ≈ 395.2 GB
This is a very crude estimate and the actual memory requirement will likely be different due to various optimizations that deep learning frameworks employ.
So, to directly answer your question: you would likely need a GPU with a lot more than 400GB of memory with the current setup, which is currently infeasible. You would have to make adjustments to your model, data, or training regime to fit it into a GPU that you can realistically acquire.
Solutions for Memory Errors
  1. Reduce Batch Size: The easiest way to reduce memory usage.
  2. Use Gradient Accumulation: Perform a backward pass after accumulating gradients over multiple smaller batches.
  3. Use Mixed Precision Training: Utilizes both float16 and float32 to make training more memory-efficient.
  4. Use a Simpler Model: Smaller architectures require less memory.
  5. Distributed Training: Split the model and data across multiple GPUs.
  6. Check for Memory Leaks: Make sure that you're not unintentionally holding onto tensors that you no longer need.
Alex Taylor
Alex Taylor 2023 年 9 月 1 日

0 投票

To the above answer I would add:
1) I'm assuming that you are using pretrainedEncoderNetwork or another method to "cut" the inceptionV3 at a given point. So, the parameter count in the network needs to be based on the set of Learnables in the backbone, not the full inceptionV3 network.
2) In practice, rather than this kind of analytic memory analysis, which can be complicated and requires some implementation details about how ops are implemented for forward/backward and how memory is cached in forward/backward, it's often just easier to start with some small input size spatially, say 224x224x3 for example, and a low batch size. Then increase either batch size or spatial dimensions one at a time to see how GPU RAM scales with changing.
3) Using a smaller backbone like a mobilenet type architecture, or using lower batch size or lower spatial dims will be ways of reducing peak memory consumption during training.
4) FCDD in particular often works well by dividing your training data into patches. Since it is a fully convolutional architecture, you can train on smaller patches at a given scale and then do full sized inference at the same scale as a way of working within memory limitations at training time but not being tied to the same spatial dimension size during inference. To do this you will need to obtain a repsentative set of good patches/tiles and a small set of bad patches/tiles from your full sized images.

7 件のコメント

William
William 2023 年 10 月 10 日
Hello Alex,
I have some additional questions based on your reply.
1) Actually, I believe I am not doing what you suggested in number 1. Are there any links to examples? In order to know what point to "cut" at, do I just view activations at various convolution layers and see where I think it is activing and cut out later in the network where I see it not activating?
2) This does sounds a bit more simple, I can give this a try.
3) The features I am looking for are quite subtle. Inception seems to have pretty good accuracy. Wouldn't changing the model entirely potentially put that at risk?
4) This sounds like a lot of work. Do you have an example link? I might try this later if nothing else works.
But currently, I am debating to get more GPU, and I just am hoping that I can get an idea how much GPU I need, before I spend money to purchase the physically GPU component.
Thank you,
Wade
William
William 2023 年 10 月 10 日
Update: I am using net = pretrainedEncoderNetwork('inceptionv3',3);
Alex Taylor
Alex Taylor 2023 年 10 月 11 日
1) A typical pattern for using a backbone with FCDD is to choose a CNN backbone where you choose output activations just before the spatial dimensions are downsampled (e.g. by a maxpool stride 2 operation commonly).
pretrainedEncoderNetwork(networkName,depth) does this automatically for you. The second input, depth, specifies how many downsampling operations are performed on the input data. You will see if you increase depth that you'll end up with more Layers and more Learnables in the output dlnetwork accordingly.
My point to the other reply was that you aren't using the full set of Learnables in inceptionv3 because you are using a truncated form of the backbone as the feature extractor, so if you were to rigorously calculate memory use you'd need to account for that.
3) Hard to say without experimentation. But choosing the lightest weight backbone that still gives you acceptable detection metrics as far as precision/recall/accuracy is a good practice for inference speed + memory detection.
4) Yes patching is more complicated and requires more detailed labeling of your data in practice because you need to know which regions of your input training images have defects and which don't.
An alternative since R2023a is we do have two new anomaly detectors in the Visual Inspection Library:
These detectors have typically have extremely good detection metrics but are heavier weight than FCDD on the memory use side, so maybe not what you're going for but potentially worth checking out. They both have the advantage of being trained on strictly good/normal data so that can be more convenient in some uses than FCDD, and they both choose reasonable backbones for you by default.
William
William 2023 年 10 月 11 日
Just an additional comment: But part of the reason I need FCDD is I actually need to know where in the image the anomly is. With the heatmap overlay, I can probably get that.
Also, the anomalies can literally be something entirely new every day, so something like YOLO wouldn't really help since it requires labels on dif types of objects, etc.
I strictly need something that can look at an image, and find something "not normal" in it, and then tell me where the "not normal" thing is, etc.
I will read on the other detectors you have provided, I appreciate you helping me out with advice.
Thank you,
Wade
William
William 2023 年 10 月 17 日
I can't find an example of how to monitor the available gpu memory during matlab training of deep learning.
I can see ways with other things, but they require being in a for loop, and I just want to know what the maxium amount of GPU memory is used when the trainFCDDAnomalDetector() function runs.
Could the Matlab profiler help me with this? I tried it a couple times, but seems like it only looks at code execution time.
Alex Taylor
Alex Taylor 2023 年 10 月 17 日
I generally use the CUDA tool nvidia-smi:
To monitor GPU memory use. You can have it run in a loop with the -l option.
William
William 2023 年 10 月 19 日
編集済み: William 2023 年 10 月 19 日
Hey Alex,
I attempted to do what you suggested in number 2).
So I decided to just kind of watch the performance tab in windows task manager.
I watched where it said "Dedicated GPU memory".
Is that Ok?
I tested 2 image sizes. I did these 2:
  • [270 480 3]
  • [540 960 3]
minibatch size was 7 for both.
Data and everything else all training parameters and everything was the same for both.
Only thing different was the resizing of the images.
Before each test, I did a gpuDevice(1) in the command window, to clear the GPU.
Also, I went ahead and closed matlab, turned off, and restarted the computer.
I did this to make sure I had a perfect fresh start each time.
For [270 480 3], starting GPU before running the training section of the code was 1.3 gig
During training, for size [270 480 3], the peak dedicated GPU appeared to be 4.5 gig
For [540 960 3], starting GPU before running the training section of the code was 1.8 gig
During training, for size [540 960 3], the peak dedicated GPU appeared to be 7.8 gig
I have a total available dedicated GPU of 8 gigs, (So that probably explains why I get GPU out of memory errors at training when I try to go to images above the size of [540 960 3] )
[540 960 3] is an image roughly twice the size of [270 480 3]
it sort of looks like it used twice the gpu to, which might be a conincidence.
Anyway, the real image size I'd like to be able to train on is [2160 3840 3]
All images are jpg.
[2160 3840 3] is exactly 4 times times bigger image than [540 960 3]
Would this mean I need 4 times the GPU?
At least 32 gigs of dedicated GPU?
thank you,
Wade

サインインしてコメントする。

カテゴリ

質問済み:

2023 年 9 月 1 日

編集済み:

2023 年 10 月 19 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by