How much GPU do I need?
古いコメントを表示
Greetings,
I am trying to train an FCDD anomaly detector using inception V3 as the backbone network.
When I change the image size above 540 960 3, I get a "GPU ran out of memory" error.
How can I know how much GPU I need?
In deep learning AI training, what characteristics of the training process effect how "much" gpu is needed?
Is that the image size and the minibatch size?
For a given image size [x y z] jpeg, and minibatch size z, can I calculate, with descriptive analysis, the needed gpu to train a network as described in my first sentence of this question post?
Thank you,
Matlab deep learning enthusiast.
回答 (2 件)
Mrutyunjaya Hiremath
2023 年 9 月 1 日
1 投票
Basic Formula to Estimate GPU Memory Requirement
Memory Required=Model Size+Batch Size×(Forward Pass Memory+Backward Pass Memory)
- Model Size: Memory to store the model weights. If the model has N parameters and each parameter is of size S bytes (usually 4 bytes for float32), then the model size is N×S.
- Forward Pass Memory: Memory to store the intermediate activations during a forward pass. This depends on the model architecture and input size.
- Backward Pass Memory: Memory to store gradients during backpropagation. This is roughly equal to the forward pass memory.
- Batch Size: Number of samples processed in parallel.
Let's take an example with hypothetical values for an Inception V3 model to illustrate:
- Model Parameters (Inception V3): Approx 21.8M
- Data Type: float32 (4 bytes)
- Input Size: 540x960x3
- Batch Size: 32
For simplicity, let's assume that the forward and backward pass each require memory roughly equal to the input size times the number of feature maps at each layer.
Model Size = 21.8M parameters × 4 bytes/parameter = 87.2 MB
Forward Pass Memory = Batch Size × Input Size × Feature Maps × 4 bytes
Backward Pass Memory ≈ Forward Pass Memory
Assuming that the feature maps are roughly the same size as the input image (another gross simplification), and that there are about 1000 feature maps (across all layers):
Forward Pass Memory = 32 × 540 × 960 × 3 × 1000 × 4 bytes ≈ 197.6 GB
Total Memory = Model Size + Forward Pass Memory + Backward Pass Memory = 87.2 MB + 2 × 197.6 GB ≈ 395.2 GB
This is a very crude estimate and the actual memory requirement will likely be different due to various optimizations that deep learning frameworks employ.
So, to directly answer your question: you would likely need a GPU with a lot more than 400GB of memory with the current setup, which is currently infeasible. You would have to make adjustments to your model, data, or training regime to fit it into a GPU that you can realistically acquire.
Solutions for Memory Errors
- Reduce Batch Size: The easiest way to reduce memory usage.
- Use Gradient Accumulation: Perform a backward pass after accumulating gradients over multiple smaller batches.
- Use Mixed Precision Training: Utilizes both float16 and float32 to make training more memory-efficient.
- Use a Simpler Model: Smaller architectures require less memory.
- Distributed Training: Split the model and data across multiple GPUs.
- Check for Memory Leaks: Make sure that you're not unintentionally holding onto tensors that you no longer need.
Alex Taylor
2023 年 9 月 1 日
0 投票
To the above answer I would add:
1) I'm assuming that you are using pretrainedEncoderNetwork or another method to "cut" the inceptionV3 at a given point. So, the parameter count in the network needs to be based on the set of Learnables in the backbone, not the full inceptionV3 network.
2) In practice, rather than this kind of analytic memory analysis, which can be complicated and requires some implementation details about how ops are implemented for forward/backward and how memory is cached in forward/backward, it's often just easier to start with some small input size spatially, say 224x224x3 for example, and a low batch size. Then increase either batch size or spatial dimensions one at a time to see how GPU RAM scales with changing.
3) Using a smaller backbone like a mobilenet type architecture, or using lower batch size or lower spatial dims will be ways of reducing peak memory consumption during training.
4) FCDD in particular often works well by dividing your training data into patches. Since it is a fully convolutional architecture, you can train on smaller patches at a given scale and then do full sized inference at the same scale as a way of working within memory limitations at training time but not being tied to the same spatial dimension size during inference. To do this you will need to obtain a repsentative set of good patches/tiles and a small set of bad patches/tiles from your full sized images.
7 件のコメント
William
2023 年 10 月 10 日
William
2023 年 10 月 10 日
Alex Taylor
2023 年 10 月 11 日
1) A typical pattern for using a backbone with FCDD is to choose a CNN backbone where you choose output activations just before the spatial dimensions are downsampled (e.g. by a maxpool stride 2 operation commonly).
pretrainedEncoderNetwork(networkName,depth) does this automatically for you. The second input, depth, specifies how many downsampling operations are performed on the input data. You will see if you increase depth that you'll end up with more Layers and more Learnables in the output dlnetwork accordingly.
My point to the other reply was that you aren't using the full set of Learnables in inceptionv3 because you are using a truncated form of the backbone as the feature extractor, so if you were to rigorously calculate memory use you'd need to account for that.
3) Hard to say without experimentation. But choosing the lightest weight backbone that still gives you acceptable detection metrics as far as precision/recall/accuracy is a good practice for inference speed + memory detection.
4) Yes patching is more complicated and requires more detailed labeling of your data in practice because you need to know which regions of your input training images have defects and which don't.
An alternative since R2023a is we do have two new anomaly detectors in the Visual Inspection Library:
These detectors have typically have extremely good detection metrics but are heavier weight than FCDD on the memory use side, so maybe not what you're going for but potentially worth checking out. They both have the advantage of being trained on strictly good/normal data so that can be more convenient in some uses than FCDD, and they both choose reasonable backbones for you by default.
William
2023 年 10 月 11 日
William
2023 年 10 月 17 日
Alex Taylor
2023 年 10 月 17 日
I generally use the CUDA tool nvidia-smi:
To monitor GPU memory use. You can have it run in a loop with the -l option.
カテゴリ
ヘルプ センター および File Exchange で Parallel and Cloud についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!