You will need to algorithmically decompose your required matrix multiplication into its scalar equivalent operations.
Even the simplest matrix multiplications require a large number of scalar multiplies; for example, the straightforward algorithm for 3x3 square * 3x1 column matrices requires 9 scalar multiplications. Multipliers are reasonably scarce on FPGAs; designers need to be conscious of their designs' resource usage and create their model appropriately.
HDL Coder offers the capability to share similar multipliers in a design, time-multiplexing the limited hardware resources to meet the design requirements and device constraints. But, this needs to be guided by the designer. A blind expansion of a matrix multiply is unlikely to satisfy most users.