I guess this ultimately comes down to how we view the initial state of the integrator stages. If we view the N integrator stages as starting with "valid" input values of 0, then we will always clock out N zeros at the start.
This makes a lot of practical sense, as it's often convenient to keep output data lengths the same as input (or, for an interpolator, exactly R times greater). Therefore, we have to impose some assumption before the start and/or after the end of the input.
Similar assumptions are made by the filter function and I find they work very well in practice. If and when we care about group delay, then this can be managed separately.