I get the illegal instruction error while running Matlab code on HPC

17 ビュー (過去 30 日間)
Hi,
I am trying to run a Matlab code using a slurm file on a HPC. While running the code I'm getting the following error:
**************************************************************************************************************************************************
--------------------------------------------------------------------------------
Illegal instruction detected at 2022-03-06 20:42:06 -0600
--------------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
Deployed : false
GNU C Library : 2.28 stable
Graphics Driver : Unknown software
Graphics card 1 : 0x102b ( 0x102b ) 0x536 Version 0.0.0.0 (0-0-0)
Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Architecture : glnxa64
MATLAB Entitlement ID : 1496533
MATLAB Root : /scratch/tacc/apps/matlab/2021b
MATLAB Version : 9.11.0.1769968 (R2021b)
OpenGL : software
Operating System : "Rocky Linux release 8.4 (Green Obsidian)"
Process ID : 3288597
Processor ID : x86 Family 175 Model 1 Stepping 1, AuthenticAMD
Session Key : aea2a8a5-7374-4c84-85a7-169f2b6bc6d7
Static TLS mitigation : Disabled: Unnecessary
Window System : No active display
Fault Count: 1
Abnormal termination:
Illegal instruction
Current Thread: 'MCR 0 interpret' id 22481996596992
Register State (from fault):
RAX = 0000146f3e473000 RBX = 0000146f3e474000
RCX = 000000000000000c RDX = 0000146f3e4362c0
RSP = 000014727f6b7840 RBP = 000014727f6b874e
RSI = 0000000000000000 RDI = 000014727f6b7910
R8 = 0000000000000140 R9 = 0000146f3e474000
R10 = 00000000000001e0 R11 = 00000000000000a0
R12 = 000000000000000a R13 = 000000000000000c
R14 = 0000000000000008 R15 = 000000000000000a
RIP = 0000146ed8b7ea62 EFL = 0000000000010212
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x0000146ed8b7ea62 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_core.so+57678434 mkl_blas_cnr_def_dgemm_kernel_bdz+00000210
[ 1] 0x0000146ed8b94d48 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_core.so+57769288 mkl_blas_cnr_def_xdgemm_bdz+00001320
[ 2] 0x0000146ed8b95231 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_core.so+57770545 mkl_blas_cnr_def_xdgemm+00000337
[ 3] 0x0000146ed56a7280 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_core.so+02269824 mkl_blas_xdgemm+00000256
[ 4] 0x0000147075d997cf /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_intel_thread.so+03327951 mkl_blas_dgemm_omp_driver_v1+00005103
[ 5] 0x0000147075d68450 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_intel_thread.so+03126352 mkl_blas_dgemm+00001584
[ 6] 0x000014707ca72547 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_intel_ilp64.so+01635655 dgemm_+00000391
[ 7] 0x000014707d521230 /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_rt.so+01552944 dgemm_+00000128
[ 8] 0x000014730165d1b6 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmathlinalg.so+03678646
[ 9] 0x00001472a112b8b5 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+10782901
[ 10] 0x00001472a112beb7 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+10784439
[ 11] 0x00001472a112bfa9 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+10784681
[ 12] 0x00001472a13cf943 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+13551939
[ 13] 0x00001472a13d3937 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+13568311
[ 14] 0x00001472a0e8eeb5 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+08044213
[ 15] 0x00001472a0e90d64 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+08052068
[ 16] 0x00001472a0e8ddf1 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+08039921
[ 17] 0x00001472a0e83ca5 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+07998629
[ 18] 0x00001472a0e84191 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+07999889
[ 19] 0x00001472a0e8d63a /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+08037946
[ 20] 0x00001472a0e8d736 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+08038198
[ 21] 0x00001472a0fbbd7b /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+09276795
[ 22] 0x00001472a0fbeca0 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+09288864
[ 23] 0x00001472a11ee836 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+11581494
[ 24] 0x00001472a119a7cc /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+11237324
[ 25] 0x00001472a119b48c /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+11240588
[ 26] 0x00001472a123c9e4 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+11901412
[ 27] 0x00001472a123cba9 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwm_lxe.so+11901865
[ 28] 0x000014769c135679 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00587385
[ 29] 0x000014769c194ed2 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00978642 _ZN3iqm14UserEvalPlugin7executeEP15inWorkSpace_tag+00000642
[ 30] 0x000014769c1703af /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00828335
[ 31] 0x000014769c17bf26 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00876326
[ 32] 0x000014769c140706 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00632582
[ 33] 0x00001476bc06ac02 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwbridge.so+00302082
[ 34] 0x00001476bc06b543 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwbridge.so+00304451
[ 35] 0x00001476bc0717f2 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwbridge.so+00329714 _Z22mnGetCommandLineBufferbRbN7mwboost8optionalIKP15inWorkSpace_tagEEbRKNS0_9function2IN6mlutil14cmddistributor17inExecutionStatusERKNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEEES4_EE+00000274
[ 36] 0x00001476bc071a92 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwbridge.so+00330386 _Z8mnParserv+00000482
[ 37] 0x000014768c1569d7 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmcr.so+00940503
[ 38] 0x00001476d912abe0 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmvm.so+02825184 _ZNK7mwboost9function0IvEclEv+00000032
[ 39] 0x00001476d9132740 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmvm.so+02856768 _ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_13unique_futureIDTclfp_EEEEEERKT_+00000048
[ 40] 0x00001476d91327e8 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmvm.so+02856936 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000024
[ 41] 0x000014769c1908bb /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00960699 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tag+00000091
[ 42] 0x000014768c154695 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmcr.so+00931477
[ 43] 0x000014769c1703af /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00828335
[ 44] 0x000014769c13e5bc /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00624060
[ 45] 0x000014769c13f050 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwiqm.so+00626768
[ 46] 0x000014768c13eace /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmcr.so+00842446
[ 47] 0x000014768c13f0ec /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmcr.so+00844012
[ 48] 0x000014768c13f362 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwmcr.so+00844642
[ 49] 0x00001476d72888a7 /scratch/tacc/apps/matlab/2021b/bin/glnxa64/libmwboost_thread.so.1.72.0+00063655
[ 50] 0x00001476d87e015a /usr/lib64/libpthread.so.0+00033114
[ 51] 0x00001476d7f74dd3 /usr/lib64/libc.so.6+01035731 clone+00000067
** This crash report has been saved to disk as /home1/07861/arya073/matlab_crash_dump.3288597-1 **
MATLAB is exiting because of fatal error
/tmp/slurmd/job95801/slurm_script: line 29: 3288597 Killed matlab -nodisplay -nodesktop -nosplash < /work/07861/arya073/ls6/NM_project/run_folder_R4200_G10000/run_ccama_TACC_1.m >> /work/07861/arya073/ls6/NM_project/output_final_R4200/output1
*********************************************************************************************************************************
when I make the code smaller in terms of the computational size it can run for a longer time but at the end I still get the error. I tried to use the -nojvm as well when I was running the Matlab but I got the segmentation fault when I ran the code. Can you let me know what can be the issue?
  1 件のコメント
Star Strider
Star Strider 2022 年 3 月 7 日
This is likely not something the volunteers here can help with.
Please Contact Support and provide a link to this thread in your message to them so it will not be necessary to repeat eveything.

サインインしてコメントする。

採用された回答

Seyedalireza Abootorabi
Seyedalireza Abootorabi 2022 年 3 月 8 日
編集済み: Seyedalireza Abootorabi 2022 年 3 月 8 日
The problem is solved! Based on the error messages it was an issue with Intel's MKL library. I added the following lines before I run the matlab in my slurm file and the problem went away:
unset BLAS_VERSION
unset LAPACK_VERSION
side note: (Matlab version is 2021b)
Thanks to everyone
  3 件のコメント
Seyedalireza Abootorabi
Seyedalireza Abootorabi 2022 年 3 月 8 日
actually no. I was running the code on a HPC and I'm not that much aware of their internal settings
Heiko Weichelt
Heiko Weichelt 2022 年 3 月 10 日
編集済み: Heiko Weichelt 2022 年 3 月 10 日
It might be worth investigating this with your HPC admin as this setting can cause issues easily.
Until R2021b, we still shipped MKL2019U3 as newer MKL's failed our intensive evaluation. R2022a, released yesterday, now uses oneMKL2021U3.
Using any other version, like MKL2020U1 that is shown in the stack trace, is not recommended. In fact, we evaliated MKL2020U1 and decided not to ship it on purpose.

サインインしてコメントする。

その他の回答 (1 件)

Heiko Weichelt
Heiko Weichelt 2022 年 3 月 8 日
Please reach out to technical support, as already suggested, for more help.
Most likely the issue comes from the MKL that is used. The stack traces indicates that MATLAB seems to be using
'/opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64/libmkl_rt.so'
which is NOT the MKL, shipping with MATLAB. Furthermore, 'libmkl_rt.so' is just all of MKL, this means, it includes LP64 and ILP64 symbols. (compare https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models)
MATLAB, however, only uses ILP64 mode. If both symbols are present, as they are called the same, bad things can happen and crashes are nearly inevitable.
  1 件のコメント
Seyedalireza Abootorabi
Seyedalireza Abootorabi 2022 年 3 月 8 日
編集済み: Seyedalireza Abootorabi 2022 年 3 月 8 日
I agree, I actually were able to run the same code on other HPCs. The only clear difference that I can see is that the other HPCs have older versions of matlab as well but this particular HPC is new and only has matlab 2021.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStartup and Shutdown についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by