Fast Implementation of Two-Dimensional APES and
CAPON Spectral Estimators*
ERIK G. LARSSON
Department of Electrical and Computer Engineering, University of Florida, P.O. Box 116130, Gainesville, FL
32611, USA
PETRE STOICA
Department of Systems and Control, Uppsala University, P.O. Box 27, 751 03 Uppsala, Sweden
Received August 14, 2000; Revised February 12, 2001; Accepted March 16, 2001
Abstract. The matched-filterbank spectral estimators APES and CAPON have recently received considerable
attention in a number of applications. Unfortunately, their computational complexity tends to limit their usage in
several cases – a problem that has previously been addressed by different authors. In this paper, we introduce a
novel approach to the computation of the APES and CAPON spectra, which leads to a computational method that
is considerably faster than all existing techniques. The new implementations of APES and CAPON are called fast
APES and fast CAPON, respectively, and are developed for the two-dimensional case, with the one-dimensional
case as a special case. Numerical examples are provided to demonstrate the application of APES to synthetic
aperture radar (SAR) imaging, and to illustrate the reduction in computational complexity provided by our method.
Key Words: Two-dimensional spectral estimation, APES, CAPON, Fast algorithms, Displacement rank
1. Introduction
Amplitude-spectrum estimation has a wide range of applications, including time-series
analysis, synthetic aperture radar (SAR) imaging, nuclear magnetic resonance (NMR)
spectroscopy, magnetic resonance imaging (MRI), and geophysics, to mention a few. The
classical approaches to spectral estimation include the discrete Fourier transform (DFT),
and its variants which are typically based on smoothing the spectral estimate or windowing
the data, see, e.g., [1], [2], [3]. The DFT and its variants can be interpreted as filtering the
data through a bank of frequency dependent (but data-independent) narrowband filters and
estimating thepower at the filter outputs [1], [4], [5].The spectral resolutionof the so-obtained
estimators is rather poor, and it has recently been realized that using filter banks with data-
dependent filters can yield a significantly improved resolution. Three such filterbank based
estimators which have recently received considerable attention in a number of applications
are the amplitude spectrum CAPON (ASC) estimator, the power spectrum CAPON (PSC)
estimator [6] and the amplitude and phase estimation (APES) approach [7], [8].
Multidimensional Systems and Signal Processing, 13, 35–53, 2002
# 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.
* This work was partly supported by the Swedish Foundation for Strategic Research (SSF).
Unfortunately, direct implementation of the above estimators is computationally very
demanding. Finding computationally efficient implementations of these estimators is of
considerable importance not only for off-line data analysis applications, but also for
possible hardware implementations where the computational resources may be even
more constrained.
A very efficient procedure for computing the two-dimensional (2D) PSC estimate was
proposed by Jakobsson et al. in [5], [9] (in the one-dimensional (1D) case this procedure
reduces to that introduced in [10]). Their implementation is based on an approximation
where the outer-product sample covariance matrix is replaced by a structured block-
Toeplitz matrix obtained by fitting an autoregressive model to the data. We note here also
the work in [11] (see also [5]) where the same approximation technique was applied to the
computation of the ASC spectrum. Unfortunately, this approximation technique cannot be
used for the computation of the APES spectrum. The implementation of APES has
previously been discussed in [12], and to some extent, in [13]. The technique in [12]
improves significantly over the ‘‘intuitive’’ ways of implementing APES suggested in
[13], and it was the fastest available method for the computation of the APES spectrum.
However, its computational complexity is still very high, and when working with large
data sets, one normally has to resort to splitting techniques such as those proposed in [14].
This paper presents a novel way of implementing APES and ASC that is significantly
faster than the available methods. As a by-product, we also obtain an efficient imple-
mentation of the PSC estimator. Our computational technique is mainly based on results
for fast computation of convolutions, products of trigonometric polynomials and Cholesky
factors. We treat only the 2D case – the 1D case follows as a special case.
2. APES, CAPON and Some Notation
Consider the problem of estimating the amplitude spectrum of a complex-valued two-
dimensional signal {Xn,n¯}n=0,n¯=0
N�1,N¯�1 where N � N¯ is the dimension of the data matrix. We
require N � 1 and N¯ > 1, where the special case N = 1 will be referred to as the 1D case.
In order to describe the APES and CAPON estimators, we start by introducing the user
parameters M and M¯, which are usually referred to as filter lengths. Increasing M and M¯
typically increases the spectral resolution at the cost of reducing the statistical stability of
the spectral estimates [13]. Let L J N � M + 1 and L¯ J N¯ � M¯ + 1 (note that in the 1D
case we have M = N = L = 1), and arrange the samples Xn,n¯ in a matrix as
YJ½ y0;0 : : : yL�1;0 y0;1 : : : yL�1;�L�1� 2 CM ; �M�L�L ð1Þ
where
y
l; l J vec
X l;�l X l;lþ �M�1
..
. . .
. ..
.
X
lþM�1;l X lþM�1;lþ �M�1
2
64
3
75
0
B@
1
CA ð2Þ
E. LARSSON AND P. STOICA36
and where vec ( ) denotes the operation of stacking the columns of a matrix on top of
each other.
Define the forward sample covariance matrix
R^fJ
1
L�L
XL�1
l¼0
X�L�1
�l¼0
yl;�l y
�
l;�l ¼
1
L�L
YY � 2 CM �M�M �M ð3Þ
and the forward-backward sample covariance matrix
R^fbJ
1
2
R^f þ��M �M R^Tf ��M �M
� �
ð4Þ
where 8P denotes a P � P reversal matrix, i.e. a matrix with ones on its anti-diagonal and
zeros everywhere else, and ( )* denotes the conjugate transpose. The forward-backward
covariance matrix R^fb is commonly preferred over R^f, since the use of it yields superior
spectral estimates [15], [13]. Hence we consider hereafter only R^fb, and for simplicity we
denote it by R^.
Introduce, for arbitrary integers P and P¯, the 2D Fourier vector
aP;�Pð!; �!ÞJ½1 ei! eiðP�1Þ!�T
½1 ei�! eið�P�1Þ�!�T ð5Þ
where
denotes the Kronecker product [16]. Define the 2D Fourier transforms
�gð!; �!ÞJ 1
L�L
YaL;�Lð�!;��!Þ ¼
1
L�L
XL�1
l¼0
X�L�1
�l¼0
yl;�l e
�ið!lþ�!�l Þ
~gð!; �!ÞJ 1
L�L
~YaL;�Lð�!;��!Þ ð6Þ
where Y is defined in (1) and (2), and
~YJ��M �MY c��L�L ð7Þ
Here ( )c denotes the complex conjugate.
We are now ready to introduce the APES and CAPON spectral estimators. The ASC is
defined by
�ASCð!; �!ÞJ
a�
M ; �M
ð!; �!ÞR^�1�gð!; �!Þ
a�
M ; �M
ð!; �!ÞR^�1aM ; �M ð!; �!Þ
ð8Þ
and the PSC is given by
TWO-DIMENSIONAL APES AND CAPON SPECTRAL ESTIMATORS 37
�PSCð!; �!ÞJ 1
a�
M ; �M
ð!; �!ÞR^�1aM ; �M ð!; �!Þ
ð9Þ
The APES estimator is defined by
�APESð!; �!ÞJ
a�
M ; �M
ð!; �!ÞQ^�1ð!; �!Þ�gð!; �!Þ
a�
M ; �M
ð!; �!ÞQ^�1ð!; �!ÞaM ; �M ð!; �!Þ
ð10Þ
where
Q^ð!; �!ÞJR^� 1
2
½ �gð!; �!Þ ~gð!; �!Þ� �g
�ð!; �!Þ
~g�ð!; �!Þ
� �
ð11Þ
Typically the estimated spectrum is evaluated on a fine uniform frequency grid, and in
this paper we let K � K¯ denote the number of points of this grid. Derivations of the
above estimators as well as discussions of their properties can be found in, for
instance, [7], [13], [15], [5]. In this paper we focus on the implementation aspects of
the estimators.
3. Fast Implementations
We first present equivalent expressions for the APES, ASC and PSC spectra that will serve
as starting points for the derivation of our fast implementation algorithms.
Lemma 1: Define the following trigonometric polynomials:
p11ð!; �!ÞJa�M ; �M ð!; �!ÞR^�1aM ; �M ð!; �!Þ
p12ð!; �!ÞJ 1L�L a�M ; �M ð!; �!ÞR^�1YaL;�Lð�!;��!Þ
p13ð!; �!ÞJ 1L�L a�M ; �M ð!; �!ÞR^�1 ~YaL;�Lð�!;��!Þ
p22ð!; �!ÞJ 1L2�L2 a�L;�Lð�!;��!ÞY �R^�1YaL;�Lð�!;��!Þ
p23ð!; �!ÞJ 1L2�L2 a�L;�Lð�!;��!ÞY �R^�1 ~YaL;�Lð�!;��!Þ ð12Þ
and the trigonometric polynomial matrix
XXXXð!; �!ÞJ 1
2
p22ð!; �!Þ � 2 p23ð!; �!Þ
p23
� ð!; �!Þ p22ð!; �!Þ � 2
� �
ð13Þ
E. LARSSON AND P. STOICA38
Then it holds that
p�13ð!; �!Þ ¼ p�12ð!; �!Þe�iððN�1Þ!þð�N�1Þ�!Þ ð14Þ
Furthermore, the ASC, PSC and APES spectra can be expressed as
�ASCð!; �!Þ ¼ p12ð!;�!Þp11ð!;�!Þ
�PSCð!; �!Þ ¼ 1p11ð!;�!Þ
ð15Þ
and
�APESð!; �!Þ ¼
p12ð!; �!Þ � 12 ½ p12ð!; �!Þ p13ð!; �!Þ�
PPPP�1ð!; �!Þ p22ð!; �!Þ
p�23ð!; �!Þ
� �
p11ð!; �!Þ � 12 ½ p12ð!; �!Þ p13ð!; �!Þ�
PPPP�1ð!; �!Þ p
�
12ð!; �!Þ
p�13ð!; �!Þ
� � ð16Þ
Proof Equation (14) is shown in [13]. The expressions in (15) follow immediately from
(8), (9) and (12). The proof of equation (16) consists of applying the matrix inversion
lemma (see, e.g., [1]) to (10) and is carried out in [13]. &
Let R^�1/2 denote the inverse of the Cholesky factor of R^, i.e., the (unique) upper
triangular matrix with real diagonal elements such that R^�1/2 (R^�1/2)* = R^�1. Furthermore,
let {rm}m = 1
MM¯ be the columns of R^�1/2. Then the polynomials in Lemma 1 can be written as:
pijð!; �!Þ ¼
PM �M
m¼1
p
ðmÞ
i
�ð!; �!ÞpðmÞj ð!; �!Þ
¼ PM �M
m¼1
p
ðmÞ
ij ð!; �!Þ
ð17Þ
where
p
ðmÞ
1 ð!; �!ÞJr�maM ; �M ð!; �!Þ
p
ðmÞ
2 ð!; �!ÞJ 1L�L r�mYaL;�Lð�!;��!Þ
p
ðmÞ
3 ð!; �!ÞJ 1L�L r�m ~YaL;�Lð�!;��!Þ
ð18Þ
and
p
ðmÞ
ij ð!; �!ÞJpðmÞi �ð!; �!ÞpðmÞj ð!; �!Þ ð19Þ
The technique proposed by Liu et al. in [12] for the computation of 1D and 2D APES and
ASC evaluates the polynomials in Lemma 1 by using the fact that once R^�1/2, R^�1/2Y
TWO-DIMENSIONAL APES AND CAPON SPECTRAL ESTIMATORS 39
and R^�1/2Y~ were computed, the trigonometric polynomial vectors R^�1/2aM,M¯ (x, �! ) ,
R^�1/2YaL,L¯(�x, ��!) and R^�1/2Y~aL,L¯(�x, ��!) can be evaluated using MM¯ 2D FFTs of
size K � K¯. In [11] (see also [5]), Ekman et al. use the same technique for the evaluation of
ASC, with the exception that the Cholesky factor R^�1/2 is approximated with a structured
matrix obtained by fitting an autoregressive model to the data. This approximation
technique provides efficient means for computing R^�1/2, which is often claimed to be a
bottleneck in the computation of the spectrum (at least for the 1D case). However, it has
turned out that this technique is not applicable to the computation of the APES spectrum.
The key observation that leads to our fast implementation of APES, ASC and PSC is as
follows: instead of computing the values of the MM¯ polynomials in (18) at K � K¯ points,
we compute the coefficients of the polynomials pij(x, �!) in (17) and upon completion of
this computation, each polynomial is evaluated at K � K¯ points using a single 2D FFT.
The computation of the polynomial coefficients is performed by accumulating the
contributions from pij
(m)(x, �!¯) in the sum in (17). This idea is reminiscent of the fast
algorithms for PSC computation in [10] (for the 1D case) and [5], [9] (for the 2D case).
To exploit this idea optimally, we combine it with some relevant results on fast
computation involving matrices and trigonometric polynomials. Since these results are
also of independent interest, we state and prove them in three subsequent lemmas.
The first result is merely a specialization of a known result on fast vector-matrix
multiplications and convolutions (see, e.g., [17], [18] for some early works on this topic).
Lemma 2: Let u be an arbitrary vector of length MM¯. Then the products u*Y and u*Y~,
where Y and Y~ are defined in (1) and (7), can be computed in O(NN¯ log(NN¯)) floating
point operations (O(N¯ log(N¯)) in the 1D case). The same is true for products of the type Yu
and Y~u, where u is a vector of length LL¯.
Proof See Appendix A. &
The operation count in the lemma should be compared with direct evaluation of the
vector-matrix product u*Y, which would require O(MM¯NN¯) operations (O(M¯N¯) in the 1D
case). Note that one, but not the only, useful consequence of Lemma 2 is that R^ can be
computed inO(MM¯NN¯ log(NN¯)) operations. To see this, note that row k of R^ equals 1
LL¯
uk*Y*
where uk* is the kth row of Y, and apply Lemma 2 M¯M times (once for each row). This
should be compared with direct evaluation of (3) and (4) which requires O(M2M¯2NN¯)
operations.
The second result shows that certain trigonometric polynomials can be multiplied
efficiently.
Lemma 3: Let
pð!; �!Þ ¼ pnp�1;�np�1eiððnp�1Þ!þð�np�1Þ�!Þ þ þ pnp�1;0eiðnp�1Þ! þ
þp0;�np�1eið�np�1Þ�! þ þ p0;0
ð20Þ
and
E. LARSSON AND P. STOICA40
qð!; �!Þ ¼ qnq�1;�nq�1eiððnq�1Þ!þð�nq�1Þ�!Þ þ þ qnq�1;0eiðnq�1Þ! þ
þq0;�nq�1eið�nq�1Þ�! þ þ q0;0
ð21Þ
be two arbitrary 2D trigonometric polynomials. Then the (np + nq �1)(n¯p + n¯q �1)
coefficients of the polynomial
rð!; �!Þ ¼ p�ð!; �!Þqð!; �!Þ ¼ rnq�1;�nq�1eiððnq�1Þ!þð�nq�1Þ�!Þ þ þ r0;0 þ
þ r�npþ1;��npþ1eiðð�npþ1Þ!þð��npþ1Þ�!Þ ð22Þ
as well as the coefficients of the polynomials
~rð!; �!Þ ¼ p�ð!; �!Þqð�!;��!Þ
r^ð!; �!Þ ¼ p�ð�!;��!Þqð�!;��!Þ
ð23Þ
can be computed in O((np + nq)(n¯p + n¯q) log((np + nq)(n¯p + n¯q))) operations.
Proof The lemma follows from the standard result that multiplication of polynomials
corresponds to a convolution which can be performed using the fast Fourier transform
(FFT). For completeness, we offer an outline of the proof in Appendix B. &
As a remark, note that direct computation of the polynomial coefficients, for instance by
evaluation of (38) in Appendix B, would require O((np + nq)
2(n¯p + n¯q)
2) operations.
The third result devises an efficient method for computing the columns of R^�1/2. The
method is based on a special factorization of R^ and the generalized Schur algorithm
[19], [20].
Lemma 4: By using the Generalized Schur algorithm presented in Appendix C, the
columns {rk}k = 1
MM¯ of R^�1/2 can be computed from the data in O(MNN¯ log(NN¯) + M2M¯2N)
floating point operations (O(N¯ log(N¯) + M¯2) in the 1D case). Under the further
assumption (which will be seen to be valid in our application) that each rk is used directly
after its computation, i.e. the whole Cholesky factor is not stored, running the algorithm in
Appendix C requires O(MM¯N) bits of memory (O(M¯) in the 1D case).
Proof See Appendix C. &
The operation count in the lemma should be compared to computation of R^ via (3) and
(4) followed by standard Cholesky factorization and inversion [21] which together require
O(M2M¯2NN¯) operations (O(M¯2N¯) in the 1D case). In the 1D case, the method of Lemma 4
is usually significantly faster than inversion and classical Cholesky factorization. This is
true also for the 2D case, provided thatM, M¯ are relatively large (which in effect is the case
of interest in practice). However for small M, M¯ conventional Cholesky factorization and
inversion can be somewhat faster.
TWO-DIMENSIONAL APES AND CAPON SPECTRAL ESTIMATORS 41
As a further remark, note that storage of the whole Cholesky factor would require
O(M2M¯2) bits of memory (O(M¯2) in the 1D case), which is typically much more than the
storage requirement of the algorithm in Appendix C. The low memory requirement
(especially in the 1D case) of the algorithm is of value for both possible hardware
implementations and off-line data analysis applications.
Finally, we stress that the lemma gives the inverse of the exact Cholesky factor, which
should not be confused with the approximations in [5], [9], [11] (the latter are also relatively
computationally efficient but they operate on a structured Toeplitz approximation of R^).
We are now ready to present our fast implementation of the APES and CAPON
spectral estimators.
Fast Implementation of APES, ASC and PSC:
1. Decide, based on the values of M,M¯,N,N¯ and the above operation counts, whether to use
classical Cholesky factorization and inversion or the fast technique of Lemma 4 to
compute R^�1/2 (in the examples in Section 4 we have used only the fast Cholesky
factorization). If the fast factorization method is chosen, before the usage of the
algorithm in Appendix C, compute the displacement factorizations introduced in
Lemmas 5 and 6 in Appendix C. Otherwise, first compute R^ as indicated in the remark
after Lemma 2 and thereafter compute R^�1/2 by a direct method [21].
2. Perform the following steps for m = 1,. . ., MM¯:
(a) Obtain the mth column rm of R^
�1/2. This is done in one of the following two ways:
. If the classical Cholesky factorization is used, rm is already precomputed in
Step 1 above.
. If the fast Cholesky factorization is used, perform one iteration (i.e. the Steps 1–3) of
thegeneralizedSchur algorithmdescribed in theproof ofLemma4 (seeAppendixC).
(b) Compute rm*Y and rm*Y~ by using Lemma 2.
(c) Compute the coefficients of the polynomials p1
(m)(x, �!), p2
(m)(x,�!), p3
(m)(x, �!) in (18)
by using the results of step (a)–(b) above.
(d) Compute the coefficients of the polynomials p(m)ij (x, �!) in (19) by using Lemma
3. These coefficients are summed up to obtain the coefficients of the polynomials
pij(x, �!) in (17).
3. Evaluate the polynomials pij(x, �!) at K � K¯ points by using a single 2D FFT
(per polynomial).
4. Compute the ASC, PSC or APES spectrum according to (15) or (16).
Note that the second step of the method of Lemma 3 (see Appendix B) gives the
Fourier transform of each polynomial pij
(m)(x, �!) (with respect to its coefficient
indices). Therefore, to obtain maximal computational efficiency in a practical imple-
mentation, we skip the last step in the algorithm in Appendix B and accumulate the
Fourier transforms of the polynomial coefficients instead of the coefficients them-
selves. Hence Step 3 above amounts to computing (for each polynomial pij(x, �!)) one
FFT of a size in the order of magnitude of N � N¯ and one FFT of the size K � K¯.
This step can be interpreted as an interpolation that could possibly be made in a
different way; however the bottleneck of the spectrum computation is usually not this
final step.
E. LARSSON AND P. STOICA42
4. Numerical Examples
We present in this section a numerical comparison of the computational complexity of our
implementation of the 1D and 2D APES, ASC and PSC with that of the fastest techniques
available in the literature, as well as an example of the application of APES to SAR imaging.
In the first example we perform a complexity comparison for the 1D case (N =M = L = 1),
for varying data lengths N¯. Since the comparison only depends on the size of the data
sequence, but not on the data themselves, we used a randomly generated data string. The
comparison is carried out by measuring the number of floating point operations used by a
Matlab [22] implementation. The filter length M¯ = N¯/2 is used, and the spectrum is evaluated
on a frequency grid with K¯ = 4096 points. Figure 1 shows the results. We compare our
implementation of APES with the technique suggested by Liu et al. in [12]. It can be
observed that our algorithm requires around 5%–10% of the floating point operations
needed by the method of [12]. We stress that our algorithm, as well as the method of [12],
computes the exact spectrum (the differences between the spectra computed by our method
and the method of [12] were within the numerical accuracy of our workstation). In Figure 1
we also show a comparison of our implementation of ASCwith the (approximate) technique
suggested by Ekman et al. in [11]. It is clear that also in this case our technique is
considerably faster, despite the fact that [11] computes only a (fast) approximation of ASC.
In the second example we perform a complexity comparison for the 2D case, which is
perhaps of even more practical and theoretical relevance. A (randomly generated) square
data matrix of v