Documentation

## Procrustes Analysis

### Compare Landmark Data

The `procrustes` function analyzes the distribution of a set of shapes using Procrustes analysis. This analysis method matches landmark data (geometric locations representing significant features in a given shape) to calculate the best shape-preserving Euclidean transformations. These transformations minimize the differences in location between compared landmark data.

Procrustes analysis is also useful in conjunction with multidimensional scaling. In Construct a Map Using Multidimensional Scaling there is an observation that the orientation of the reconstructed points is arbitrary. Two different applications of multidimensional scaling could produce reconstructed points that are very similar in principle, but that look different because they have different orientations. The `procrustes` function transforms one set of points to make them more comparable to the other.

### Data Input

The `procrustes` function takes two matrices as input:

• The target shape matrix X has dimension `n` × `p`, where `n` is the number of landmarks in the shape and `p` is the number of measurements per landmark.

• The comparison shape matrix Y has dimension `n` × `q` with `q``p`. If there are fewer measurements per landmark for the comparison shape than the target shape (`q` < `p`), the function adds columns of zeros to Y, yielding an `n` × `p` matrix.

The equation to obtain the transformed shape, Z, is

 $Z=bYT+c$ (1)

where:

• b is a scaling factor that stretches (b > 1) or shrinks (b < 1) the points.

• T is the orthogonal rotation and reflection matrix.

• c is a matrix with constant values in each column, used to shift the points.

The `procrustes` function chooses b, T, and c to minimize the distance between the target shape X and the transformed shape Z as measured by the least squares criterion:

`$\sum _{i=1}^{n}\sum _{j=1}^{p}{\left({X}_{ij}-{Z}_{ij}\right)}^{2}$`

### Preprocess Data for Accurate Results

Procrustes analysis is appropriate when all `p` measurement dimensions have similar scales. The analysis would be inaccurate, for example, if the columns of Z had different scales:

• The first column is measured in milliliters ranging from 2,000 to 6,000.

• The second column is measured in degrees Celsius ranging from 10 to 25.

• The third column is measured in kilograms ranging from 50 to 230.

In such cases, standardize your variables by:

1. Subtracting the sample mean from each variable.

2. Dividing each resultant variable by its sample standard deviation.

Use the `zscore` function to perform this standardization.