This corresponds to the amount of information remembered between time steps (the hidden state). The hidden state can contain information from all previous time steps, regardless of the sequence length.
If the number of hidden units is too large, then the layer might overfit to the training data.
Output Size in fullyConnectedLayer -
In case of sequence to sequence regression, it refers to the length of the output sequence.
In case of sequence classification, it refers to the number of classes.