## Nonstationary Gabor Frames and the Constant-Q Transform

Nonstationary Gabor frames enable you to implement time-adaptive or frequency-adaptive
analysis of signals. The functions `cqt`

and
`icqt`

use
nonstationary Gabor frames to obtain a constant-Q (frequency-adaptive) transform (CQT) of a
signal. A notable strength of nonstationary Gabor frames is that they enable the construction
of stable inverses, yielding perfect reconstruction.

The theory of nonstationary Gabor frames and efficient algorithms for their implementation are due to Dörfler, Holighaus, Grill, and Velasco [1][2]. The algorithms in [1] and [2] implement a phase-locked version of the CQT that does not preserve the same phases that would be obtained by naïve convolution. In [3], Schörkhuber, Klapuri, Holighaus, and Dörfler develop efficient algorithms for the CQT and inverse CQT that do mimic the coefficients obtained by naïve convolution. The Large Time-Frequency Analysis Toolbox [4] provides an extensive set of algorithms for nonstationary Gabor analysis and synthesis.

In standard Gabor analysis, a window of fixed size tiles the time-frequency plane. A nonstationary Gabor frame is a collection of windowing functions of various sizes that are used to tile the time-frequency plane. Wavelet analysis tiles the time-frequency plane in a similar manner. You have the flexibility to change the sampling density in time or frequency. Nonstationary Gabor frames are useful in areas such as audio signal processing, where fixed-sized time-frequency windows are not optimal. Unlike the short-time Fourier transform, the windows used in the constant-Q transform have adaptable bandwidth and sampling density. In frequency space, the windows are centered at logarithmically spaced center frequencies.

### Decomposing the Time-Frequency Plane

The Fourier transform of *f(t)* is the correlation of *f(t)* with *e ^{j ω t}*:

$$F(\omega )={\displaystyle {\int}_{-\infty}^{\infty}f}(t){e}^{-j\omega t}dt.$$

Since *e ^{j ω t}* does not have compact support, the Fourier transform is not an ideal
choice for studying nonstationary signals. If the frequency content of a signal changes over
time, the Fourier transform does not capture what those changes are or when those changes
occur. The partition of the time-frequency plane shown here represents this Fourier
transform behavior.

To perform a time-frequency analysis of a nonstationary signal, start with a real-valued
even windowing function, $$g(t)$$, which is effectively nonzero over only a finite interval and has norm
equal to one. In addition, the Fourier transform of $$g(t)$$ is centered at zero and is lowpass. Next, window *f(t)* with translates of $$g(t)$$. Then take the Fourier transform of the result

$$SF(u,\zeta )={\displaystyle \int f}(t)g(t-u){e}^{-j\text{\hspace{0.05em}}\zeta \text{\hspace{0.05em}}t}dt.$$

Correlating *f(t)* with the Gabor atoms, $$g(t-u){e}^{j\zeta t}$$, is standard Gabor analysis. By varying *u*, you consider only values of *f(t)* near time *u*. The support of $$g(t)$$ determines the size of the neighborhood near time *u*. The Fourier transform of $${g}_{u,\zeta}(t)=g(t-u){e}^{\zeta t}$$ is the translation by ζ of the Fourier transform of $$g(t)$$ and is given by

$${\widehat{g}}_{u,\zeta}(\omega )={e}^{-(\omega -\zeta )}\widehat{g}(\omega -\zeta ).$$

The energy concentration of $${\widehat{g}}_{u,\zeta}(\omega )$$ has variance σ_{ω} and is centered at ζ. If the window, $${g}_{u,\zeta}(t)=g(t-u){e}^{\zeta t}$$, shifts on a regular grid, the Fourier transform of the product of the
shifted window and f(t) is the short-time Fourier transform (STFT). The STFT tiling of the
time-frequency plane can be represented as a grid of boxes, each centered at (*u*, ζ):

The set of functions $$\left\{{g}_{u,\zeta}\right\}$$ is known as a *Gabor frame*. The elements of this set
are called *Gabor atoms*. A frame is a set of functions, *{h _{k}(t)}*, that satisfy the following condition: there exist constants 0 < A ≤ B < ∞ such that for any function

*f(t)*,

$$A\Vert f{\Vert}^{2}\le {\Sigma}_{k}|\langle f,{h}_{k}\rangle {|}^{2}\le B\Vert f{\Vert}^{2}.$$

The energy concentration of $$g(t)$$, in time, has variance σ_{t}. The energy concentration of $$\widehat{g}(\omega )$$, in frequency, has variance σ_{ω}. The energy concentration determines how well the window localizes the
signal in time and frequency. By the time-frequency uncertainty principle, there is a limit
as to how well you can simultaneously localize in both time and frequency domains, as
indicated by

$${\sigma}_{t}{\sigma}_{\omega}\ge \frac{1}{2}.$$

Narrowing the window in one domain results in poorer localization in the other domain. Gabor showed that the area of the window is minimal when $$g(t)$$ is Gaussian.

### Constant-Q Transform

In the CQT, the bandwidth and sampling density in frequency are varied. The windows are constructed and applied directly in the frequency domain. Different windows have different center frequencies and bandwidths, but the ratio of the center frequency to bandwidth remains constant. Maintaining a constant ratio implies:

Resolution in time improves at higher frequencies.

Resolution in frequency improves at lower frequencies.

The time shifts for each window depend on the bandwidth, due to the uncertainty principle.

The CQT depends on:

The window functions

*g*are real-valued, even functions. In the frequency domain, the Fourier transform of_{k}*g*is defined on the interval,_{k}*[-Fs/2, Fs/2]*.The sampling rate, ζ

_{s}.The number of bins per octave,

*b*.The minimum and maximum frequencies, ζ

_{min}and ζ_{max}.

Choose a minimum frequency ζ_{min} and number of bins per octave *b*. Next, form a sequence of geometrically spaced frequencies,

ζ_{k} = ζ_{min} ×
2^{k/b}

for *k = 0,...,K* where *K* is an
integer such that ζ_{K} is the largest frequency
strictly less than the Nyquist frequency ζ_{s}/2. The bandwidth at the *k*th frequency is set to Ω_{k} =
ζ_{k+1}-ζ_{k-1}. Given this sampling, the ratio of the *k*th center
frequency to the window bandwidth is independent of *k*:

Q =
ζ_{k}/Δ_{k}
=
(2^{1/b}-2^{-1/b})^{-1}.

To ensure perfect reconstruction, the DC component and Nyquist frequency are prepended and appended, respectively, to the sequence.

*W*(ω) forms the window functions *g _{k}*.

*W*(ω) is a real-valued, even continuous function that is centered at 0, positive in the interval [-½,½], and 0 elsewhere.

*W*(ω) is translated to each center frequency ζ

_{k}then scaled. Evaluating a scaled and translated version of

*W*(ω) yields the filter coefficients g

_{k}[

*m*], given by

g_{k}[*m*] =
*W*((*m*
ζ_{s}/*L* -
ζ_{k})/Ω_{k})

for *m = 0, …, L-1*, where *L* is
the signal length. By default, `cqt`

uses the
`'hann'`

window.

By the uncertainty principle, the size of the bandwidth constrains the value of the time
shifts. To satisfy the frame inequality, the shift a_{k}of g_{k} must satisfy

a_{k} ≤
ζ_{k}/Ω_{k}.

As mentioned previously, the window is applied in the frequency domain. The filters, g_{k}, centered at ζ_{k}, are formed and applied to the Fourier transform of the signal. Taking the
inverse transform obtains the constant-Q coefficients.

### References

[1] Holighaus, N., M. Dörfler,
G.A. Velasco, and T. Grill. "A framework for invertible real-time constant-Q transforms."
*IEEE Transactions on Audio, Speech, and Language Processing*. Vol.
21, No. 4, 2013, pp. 775–785.

[2] Velasco, G. A., N. Holighaus,
M. Dörfler, and T. Grill. "Constructing an invertible constant-Q transform with
nonstationary Gabor frames." In *Proceedings of the 14th International Conference
on Digital Audio Effects (DAFx-11)*. Paris, France: 2011.

[3] Schörkhuber, C., A. Klapuri,
N. Holighaus, and M. Dörfler. "A Matlab Toolbox for Efficient Perfect Reconstruction
Time-Frequency Transforms with Log-Frequency Resolution." Submitted to the *AES
53rd International Conference on Semantic Audio*. London, UK:
2014.

[4] Průša, Z., P. L. Søndergaard,
N. Holighaus, C. Wiesmeyr, and P. Balazs. *The Large Time-Frequency Analysis
Toolbox 2.0*. Sound, Music, and Motion, Lecture Notes in Computer Science
2014, pp 419-442. https://github.com/ltfat