Perhaps you are mistaken. Most high level tools in MATLAB do not directly, intentionally use parallel processing, splitting the problem up. It is the lower level computations that do so, where you see the gains. And you can check when that is happening. It is quite easy to create a situation where MATLAB will use all the CPU power you have available. For example, just form a matrix multiply between two very large matrices. MATLAB passes this operation to the BLAS, lower level routines that can intelligently use multiple cores to do the work more efficiently. So if you watch a CPU monitor when that happens, suddenly your computer will get very busy. But it was not really MATLAB that did the parallel split, it was done more deeply under the hood.
Another case where I frequently see this happen is in a large computation where I might want to do a powermod operation, where I choose to compute mod(b^n,p) for huge values of n, and for many millions of numbers in the vector p. (Did you know there are roughly 51 million primes less than 1e9? I do know that.) My system fan kicks on immediately when I do these computations, with all CPUs running flat out. But again, it is not a high level MATLAB code that will decide when to parallelize code, but low level routines that see a process that can be split efficiently when that makes sense.
These computations are only split when the problem becomes sufficiently large, of course. For small problems the overhead of the parallelization becomes more effort than the gain would be worth. Add 3 or 4 numbers together, and one core is useful, and done in the blink of a computational eye. Add a billion numbers together, and there is real gain to be found in splitting the problem up.
There are also huge problems where no simple parallelization seems to be available. For example, perform a VERY large symbolic computation. Much of the time this is not so easily broken up. So I can give examples where my computer might be seen to be running flat out, but in only one thread for as long as hours at a time, so only one core is being used for the entire operation. And of course, any code where there would be lots of branches will screw up parallelized processes, which absolutely thrive on feeding the long lists of numbers through adds or multiplies.