Slow Training of RL Agent on HPC Compared to Local Machine

5 ビュー (過去 30 日間)
Gaurav
Gaurav 2024 年 6 月 6 日
コメント済み: Harald 2024 年 6 月 7 日
I am currently running a MATLAB 2021a script (execute.m added as attachment for reference) to train a reinforcement learning (RL) agent in Simulink to control a drone. While training it in my local machine it connects to 6 workers and the training speed is much higher compared to HPC which is connected to 12 workers. I have ensured that the whole node is assigned to the the job with 28 cores in total.
Here is the SLURM script:
#!/bin/bash -l
#SBATCH -J MATLAB_Execute # Job name
#SBATCH -N 1 # Number of nodes
#SBATCH -n 1 # Number of tasks (1 instance of the program)
#SBATCH -c 28 # Number of CPU cores per node
#SBATCH --gres=gpu:0 # Number of GPUs per node
#SBATCH --time=1:00:0 # Time limit (10 minutes)
#SBATCH -p batch -C skylake # Partition name (GPU partition)
export JAVA_LOG_DIR=/scratch/users/gshetty/java_logs
mkdir -p $JAVA_LOG_DIR
# Load the MATLAB module
module load math/MATLAB/2021a
module load openssl/1.1.1k
export LD_PRELOAD=/usr/lib64/libcrypto.so.1.1
# Run the MATLAB script
srun matlab -nodisplay -nosplash -r execute -logfile execute.out
what can be the potential reason?
  4 件のコメント
Gaurav
Gaurav 2024 年 6 月 6 日
Also need to mention that i use R2021a version as that is loaded in my HPC
Harald
Harald 2024 年 6 月 7 日
Hi,
that's a big difference, indeed. If it takes hours on HPC, I am surprised that it finishes at all since you have specified a time limit.
If you get error messages, please copy the precise error message you get and the code that throws them. That makes it easier to investigate.
Assuming that we are speaking of run time and not any time that your job may be queued, waiting for resources to become available, I cannot imagine why it would take that long on HPC.
If there are no further ideas here, it may be an idea to reach out to Technical Support: https://www.mathworks.com/support/contact_us.html
Best wishes,
Harald

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeContainers についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by