64 Chapte 64 Signal Processing

by taratuta

on 19 января 2017

Category: Documents

>> Downloads: 7

views

Report

Comments

Description

Download 64 Chapte 64 Signal Processing

Transcript

64 Chapte 64 Signal Processing

64
Signal Processing
Michael Stewart
Georgia State University
64.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.2 Random Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.3 Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.4 Wiener Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.5 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.6 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64.7 Direction of Arrival Estimation . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64-1
64-4
64-7
64-10
64-12
64-14
64-15
64-18
A signal is a function of time, depending on either a continuous, real time variable or a discrete integer
time variable. Signal processing is a collection of diverse mathematical, statistical, and computational
techniques for transforming, extracting information from, and modeling of signals. Common signal
processing techniques include filtering a signal to remove undesired frequency components, extracting a
model that describes key features of a signal, estimation of the frequency components of a signal, prediction
of the future behavior of a signal from past data, and finding the direction from which a signal arrives at
an array of sensors.
Many modern methods for signal processing have deep connections to linear algebra. Rather than
attempting to give a complete overview of such a diverse field, we survey a selection of contemporary
methods for the processing of discrete time signals with deep connections to matrix theory and numerical
linear algebra. These topics include linear prediction, Wiener filtering, spectral estimation, and direction of
arrival estimation. Notable ommissions include classical methods of filter design, the fast Fourier transform
(FFT), and filter structures. (See Chapter 58.)
64.1
Basic Concepts
We begin with basic definitions which are standard and can be found in [PM96].
Definitions:
A signal x is a real sequence with element k of the sequence for k = 0, ±1, ±2, . . . given by x(k). The set
of all signals is a vector space with the sum w = x + y defined by w (k) = x(k) + y(k) and the scalar
product v = a x where a is a real scalar given by v(k) = ax(k).
A linear time-invariant system is a mapping L (x) = y of an input signal x to an output signal y that
satisfies the following properties:
1. Linearity: If L (x 0 ) = y 0 and L (x 1 ) = y 1 , then L (a x 0 + b y 0 ) = a y 0 + b y 1 for any a, b ∈ R.
2. Time-invariance: If L (x) = y and x̂ is a shifted version of x given by x̂(k) = x(k − k0 ), then
L (x̂) = ŷ where ŷ(k) = y(k − k0 ). That is, shifted inputs lead to correspondingly shifted outputs.
64-1
64-2
Handbook of Linear Algebra
The impulse response of a linear time-invariant system is the output h that results from applying as
input the unit impulse δ where δ(0) = 1 and δ(k) = 0 for k = 1.
The convolution y of h and x is written y = h ∗ x and is defined by the sum
y(k) =
∞
h( j )x(k − j ).
j =−∞
A signal x is causal if x(k) = 0 for k < 0.
A linear time-invariant system is causal if every causal input gives a causal output. An equivalent
definition is that a system is causal if its impulse response sequence is causal.
The z-transform of a signal or impulse response x is
X(z) =
∞
x(k)z −k
k=−∞
where X(z) is defined in the region of convergence of the sum.
The transfer function of a linear system with impulse response h is the z-transform of the impulse
response
H(z) =
∞
h(k)z −k .
k=−∞
The discrete time Fourier transform of x is the function of ω obtained by evaluating the z-transform
on the unit circle
X(e i ω ) =
∞
x(k)e −i ωk .
k=−∞
If x is an impulse response of a system, then X(e i ω ) is the frequency response of the system. The discrete
time Fourier transform is defined only if the region of convergence of the z-transform includes the unit
circle.
A filter is minimum phase if the zeros of the transfer function are in the unit circle.
A finite impulse response (FIR) filter is a system that maps an input signal x to an output signal y
through the sum
y(k) =
n−1
b( j )x(k − j ).
j =0
A rational infinite impulse response (IIR) filter is a causal system that maps inputs x to outputs y via
a relation of the form
y(k) =
n−1
b( j )x(k − j ) −
j =0
m
a( j )y(k − j ).
j =1
Causality is an essential part of the definition; without this assumption the above relation does not uniquely
define a mapping from inputs to outputs.
A signal x has finite energy if
∞
|x(k)|2 < ∞.
k=−∞
A system mapping input x to output y is stable if |x(k)| < B x for 0 < B x < ∞ and for all k implies
that there is 0 < B y < ∞ such that the output satisfies |y(k)| < B y for all k.
64-3
Signal Processing
Facts:
1. The impulse response h uniquely determines a linear time-invariant system. The mapping from the
input x to the output y is given by y = h ∗ x. Convolution is commutative so that if y = g ∗ (h ∗ x),
then y = h ∗ (g ∗ x). Thus, the order in which two linear time-invariant filters are applied to a
signal does not matter.
2. A causal system can be applied in real time in the sense that output y(k) can be computed from
the convolution y = h ∗ x as soon as x(k) is available.
3. A system defined by its impulse response h is stable if and only if
∞
|h(k)| < ∞.
k=−∞
4. An expression for the z-transform of a signal (or an impulse response) does not uniquely determine
the signal. It is also necessary to know the region of convergence. If X(z) is the z-transform of x,
then the inverse transform is
1
2πi
x(k) =
X(z)z k−1 dz
where the integral is taken over any contour enclosing the origin and in the region where X(z) is
analytic. A system is uniquely determined by its transfer function and the region of convergence of
the transfer function.
5. An FIR filter has impulse response
h(k) =
b(k) for 0 ≤ k ≤ n − 1
0
otherwise.
6. An FIR filter has transfer function
H(z) = b(0) + b(1)z −1 + · · · + b(n − 1)z −(n−1) .
7. The impulse response h of a causal IIR filter is the unique solution to the recurrence
⎧
0
k≤0
⎪
⎨
m
b(k)
−
a(
j
)h(k
−
j
)
0≤k ≤n−1 .
h(k) =
j =1
⎪
⎩− m a( j )h(k − j )
k≥n
j =1
This recurrence uniquely determines a causal impulse response h and, hence, uniquely determines
the mapping from inputs to outputs.
8. A rational IIR filter has transfer function
H(z) =
b(0) + b(1)z −1 + · · · + b(n − 1)z −(n−1)
.
1 + a(1)z −1 + · · · + a(m)z −m
9. A linear time-invariant system with input x and impulse response h with z-transform X(z) and
transfer function H(z) has output y with z-transform
Y (z) = H(z)X(z).
10. A discrete version of the Paley–Wiener theorem states that x is causal and has finite energy if and
only if the corresponding z-transform X(z) is analytic in the exterior of the unit circle and
π
sup
1<r <∞
−π
X(r e i ω )
2 dω < ∞.
64-4
Handbook of Linear Algebra
If |X(e i ω )| is square integrable, then |X(e i ω )| is the Fourier transform of some causal system finite
energy signal if and only if
π
−π
ln |X(e i ω )| dω < ∞.
11. For rational transfer functions H(z) corresponds to a causal stable system if and only if the poles
of H(z) are in the unit circle. Thus, FIR filters are always stable and IIR filters are stable when the
poles of the transfer function are in the unit circle.
12. A filter that is minimum phase has a causal stable inverse filter with transfer function G (z) =
1/H(z). That is, filtering a signal x by a filter with z-transform H(z) and then by a filter with
z-transform G (z) gives an output with z-transform G (z)H(z)X(z) = X(z). The minimum phase
property is of particular significance since the only filters that can be applied in real time and without
excessive growth of errors due to noise are those that are causal and stable. In most circumstances
only minimum phase filters are invertible in practice.
Examples:
1. The causal impulse response h with h(k) = a k for k ≥ 0 and h(k) = 0 for k < 0 has z-transform
H(z) = 1/(1 − az −1 ) with region of convergence |z| > |a|. The anticausal impulse response ĥ
with ĥ(k) = −a k for k < 0 and ĥ(k) = 0 for k ≥ 0 has z-transform H(z) = 1/(1 − az −1 ) with
region of convergence |z| < |a|.
2. The causal impulse response h with
h(k) =
2−k + 3−k
for k ≥ 0
0
k<0
has z-transform
H(z) =
2 − 56 z −1
1
1
+
=
1 − (2z)−1
1 − (3z)−1
1 − 56 z −1 + 16 z −2
with region of convergence |z| > 1/2. The impulse response can be realized by a rational IIR filter
of the form
5
1
5
y(k) = 2x(k) − x(k − 1) + y(k − 1) + y(k − 2).
6
6
6
The zeros of z 2 − (5/6)z + (1/6) are 1/2 and 1/3 so that the system is stable.
64.2
Random Signals
Definitions of statistical terms can be found in Chapter 52.
Definitions:
A random signal or random process x is a sequence x(k) of real random variables indexed by k =
0, ±1, ±2, . . ..
A random signal x is wide-sense stationary (for brevity, we will use stationary to mean the same thing)
if the mean µx = E [x(k)] and autocorrelation sequence r x x (k) = E [x( j )x( j − k)] do not depend
on j . We assume that all random signals are wide-sense stationary. Two stationary random signals x and y
are jointly stationary if the cross correlation sequence r yx (k) = E [y( j )x( j − k)] does not depend on j .
When referring to two stationary signals, we always assume that the signals are jointly stationary as well.
64-5
Signal Processing
A sequence x(k) of real random n × 1 vectors indexed by k = 0, ±1, ±2, . . . is stationary if E [x(k)]
and E [x(k)xT (k)] do not depend on k.
The autocorrelation matrix of a sequence of stationary random vectors x(k) is
Rx = E x(k)xT (k) .
A zero mean stationary random signal n is a white noise process if it has autocorrelation sequence
r nn ( j ) = E [n(k)n(k − j )] =
σn2
j =0
0
j = 0
where σn2 is the variance of n.
A white noise driven autoregressive process (AR process) x is of the form
x(k) = n(k) −
n
a( j )x(k − j )
j =1
where n is a white noise process and the filtering of n to obtain x is causal.
Given a signal x, we denote the z-transform of the autocorrelation sequence of x by Sx (z). The spectral
density of a stationary random signal x with autocorrelation sequence r x x (k) is the function of ω given
by evaluating Sx (z) on the unit circle
Sx (e i ω ) =
∞
r xx ( j )e −ijω .
j =−∞
A random signal with spectral density Sx (e i ω ) is bandlimited if Sx (e i ω ) = 0 on a subset of [−π, π] of
nonzero measure.
A random signal with spectral density Sx (e i ω ) has finite power if
π
−π
Sx (e i ω ) dω < ∞.
The spectral factorization, when it exists, of a spectral density Sx (z) is a factorization of the form
Sx (z) = L (z)L (z −1 )
where L (z) and 1/L (z) are both analytic for |z| > 1. Interpreted as transfer functions L (z) and 1/L (z)
both describe causal, stable systems. That is, L (z) is causal and stable and causally and stably invertible.
Facts:
1. If x(k) is a sequence of vectors obtained by sampling a random sequence of vectors, then under
suitable ergodicity assumptions
n
1
x(k)xT (k).
n→∞ n
Rx = lim
k=0
In computational practice Rx is often estimated using a truncated sum in which n is finite.
2. The expectation x, y = E [xy] can be viewed as an inner product on a real Hilbert space of zero
mean random variables with E [|x|2 ] < ∞. This gives many results in optimal filtering a geometric
interpretation in terms of orthogonalization and projection on the span of a set of random variables
in a way that is perfectly analogous to least squares problems in any other Hilbert space. From this
point of view, a white noise process n is a sequence of orthogonal random variables.
64-6
Handbook of Linear Algebra
3. If a random signal x with autocorrelation sequence r x is passed through a system with impulse
response h, then the output y has autocorrelation
r y = h ∗ r x ∗ h̃
where h̃ is defined by h̃( j ) = h(− j ).
4. The spectral density is nonnegative, Sx (e i ω ) ≥ 0.
5. The spectral density of white noise is Sn (e i ω ) = σn2 .
6. [Pap85] If a random signal x with spectral density Sx (z) passes through a filter with transfer function
H(z), then the output y is a stationary signal with
S y (z) = Sx (z)H(z)H(z −1 )
S y (e i ω ) = Sx (e i ω )|H(e i ω )|2 .
or
7. If L (z) is the spectral factor of Sx (z), then on the unit circle we have Sx (e i w ) = |L (e i w )|2 .
8. [SK01], [GS84] The spectral factorization exists if and only if the signal has finite power and satisfies
π
−π
ln Sx (e i ω ) dω > −∞.
If Sx (e i ω ) is zero on a set of nonzero measure, then the condition is not satisfied. Thus, bandlimited
signals do not have a spectral factorization.
9. If a spectral factor L (z) exists for a signal x with spectral density Sx (e i ω ) and
L (z) =
∞
l (k)z −k
1
m(k)z −k ,
=
L (z)
∞
and
M(z) =
k=0
k=0
then the filtered signal n = m ∗ x given by
n(k) =
∞
m( j )x(k − j )
j =0
is a white noise process with Sn (e i ω ) = 1. The original signal can be obtained by causally and stably
filtering n with l:
x(k) =
∞
l ( j )n(k − j ).
j =0
The signal n is the innovations process for x and the representation of x through filtering n by l
is the innovations representation for x. It is a model of x as a filter driven by white noise. The
innovations representation is an orthogonalization of the sequence of random variables x. The
innovations representation for x is analogous to the Q R factorization of a matrix.
Examples:
1. For a white noise process n with variance σn2 = 1, define the autoregressive process x by
x(k) = n(k) + ax(k − 1)
where |a| < 1. The impulse response and transfer function of the IIR filter generating x are
h(k) =
k<0
0
a
k
k≥0
,
and
H(z) =
1
.
1 − az −1
64-7
Signal Processing
The autocorrelation sequence for x is the sequence r xx given by
1
a |k| .
1 − a2
r xx (k) =
The autocorrelation sequence has z-transform
Sx (z) =
1
(1 −
az −1 )(1
− az)
so that the spectral density is
Sx (e i ω ) =
1
.
1 + a 2 − 2a cos(ω)
The spectral factor is
L (z) =
64.3
1
.
1 − az −1
Linear Prediction
Definitions:
The linear prediction problem is to find an estimate x̂ of a stationary random signal x as a linear
combination
x̂(k) =
∞
a( j )x(k − j ),
j =1
with the a( j ) chosen to minimize the mean square prediction error E ∞ = E [|e(k)|2 ] where e(k) =
x̂(k) − x(k) is the prediction error. Thus, the goal is to find a current estimate x̂(k) of x(k) from the past
values x(k − 1), x(k − 2), . . ..
A random signal is predictable if E ∞ = 0. A signal that is not predictable is regular.
The order n linear prediction
n problem is to find an estimate of a stationary random signal x with2a
signal x̂ of the form x̂(k) =
j =1 a n ( j )x(k − j ) so as to minimize the mean square error E n = E [|e(k)| ]
where e(k) = x̂(k) − x(k).
A random signal is order n predictable if E n = 0.
The prediction error filter and the order n prediction error filter are the filters with transfer functions
E (z) = 1 −
∞
a( j )z − j
j =1
and
E n (z) = 1 −
n
an ( j )z − j .
j =1
The Wiener–Hopf equations for the optimal prediction coefficients a( j ) are
r xx (l ) =
∞
j =1
for l ≥ 1.
a( j )r xx (l − j )
64-8
Handbook of Linear Algebra
Given a random signal x(k) with autocorrelation r x x (k), the optimal prediction coefficients a (n) ( j ) and
mean square prediction error E n that satisfy the Yule–Walker equations are
⎡
⎤
r x x (n) ⎡ 1 ⎤ ⎡ E n ⎤
⎢
.. ⎥
..
⎢ −a (1) ⎥ ⎢ 0 ⎥
⎢r (1) r (0)
n
.
⎥ ⎢ ⎥
. ⎥
xx
⎢ xx
⎥⎢
⎥=⎢ ⎥
⎢
⎥⎢
.
⎢ . ⎥
⎢
⎢ ..
⎥ ⎣ .. ⎥
..
..
⎦ ⎣ .. ⎦
⎣ .
⎦
.
.
r x x (1)
−an (n)
0
r xx (n)
···
r xx (1) r xx (0)
r xx (0)
r xx (1)
···
or Tn an = E n e1 .
The Levinson–Durbin algorithm for the computation solution of the order n linear prediction
problem is
an =
an−1
0
+ γn
0
Ran−1
where R is the permutation that reverses the order of the elements of an−1 . The quantity γn is computed
from the relation
γn = −
r xx (n) −
n−1
j =1
an−1 ( j )r xx (n − j )
E n−1
and the mean square prediction error is computed from E n = E n−1 (1 − |γn |2 ). The algorithm starts
with E 0 = r xx (0) and a0 = 1. If x(k) is order n predictable, then the process terminates with the order n
predictor and E n = 0.
The parameters γk are known as reflection coefficients,
partial correlation coefficients or Schur
parameters. The complementary parameters are δk = 1 − |γk |2 .
Facts:
1. [Pap85] The Wold decomposition theorem: Every stationary random signal can be represented as
the sum of a regular and a predictable signal
x(k) = xr (k) + x p (k).
The two components are orthogonal with E [xr (k)x p (k)] = 0.
2. [Pap85] The Wiener–Hopf equations can be interpreted as stating that the prediction error x̂(k) −
x(k) is orthogonal to x(k − l ) for l ≥ 1 in the inner product x, y = E [xy].
3. The Toeplitz matrix Tn in the Yule–Walker equations is positive semidefinite.
4. The Levinson–Durbin algorithm is an O(n2 ) order recursive method for solving the linear prediction problem. In most cases the numerical results from applying the algorithm are comparable to
the numerically stable Cholesky factorization procedure [Cyb80]. An alternative fast method, the
Schur algorithm, computes a Cholesky factorization and can be proven to be numerically stable
[BBH95].
5. The following are equivalent:
(a) Tn−1 is positive definite and Tn is positive semidefinite.
(b) |γ j | < 1 for 1 ≤ j ≤ n − 1 and |γn | = 1.
(c) E j > 0 for 1 ≤ j ≤ n − 1 and E n = 0.
(d) The polynomial z n E n (z) has all zeros on the unit circle and each polynomial z k E k (z) for
k = 1, 2, . . . , n − 1 has all zeros strictly inside the unit circle.
6. [Pap85] The prediction errors are monotonic: E k ≥ E k+1 and E k → E ∞ ≥ 0 as k → ∞.
64-9
Signal Processing
7. [SK01] Let
ak (z) = 1 − ak (1)z − ak (2)z 2 − · · · − ak (k)z k
where ak ( j ) are the prediction coefficients for a signal x with spectral density Sx (z). If Sx (z) =
L (z)L (z −1 ) is the spectral factorization of Sx (z), then
1/2
1
Ek
.
k→∞ r x x (0)
ak (z −1 )
L (z) = lim
A survey, including other representations of the spectral factor and algorithms for computing it, is
given in [SK01].
8. [GS84] A finite power random signal x is predictable (i.e., E ∞ = 0) if and only if
π
−π
ln(Sx (e i ω )) dω > −∞.
If the signal is regular, then we have the Kolmogorov–Szegö error formula
E ∞ = exp
π
ln S(e i ω ) dω
> 0.
−π
Examples:
1. Consider the autoregressive process x given by
x(k) = n(k) + ax(k − 1)
where n is a white noise process with σn2 = 1. The solution to the order 1 linear prediction problem
is given by the Yule–Walker system
1
1−a 2
a
1−a 2
a
1−a 2
1
1−a 2
1
−a1 (1)
=
E1
0
.
The Levinson–Durbin algorithm starts with E 0 = r xx (0) = 1/(1 − a 2 ) and a0 = 1. Thus,
γ1 = −
r xx (1)
a/(1 − a 2 )
=−
= −a
E0
1/(1 − a 2 )
so that
1
−a1 (1)
= a1 =
a0
0
1
+ γ1
=
.
a0
−a
0
Thus, a1 (1) = a and the optimal linear prediction of x(k) from x(k − 1) is given by
x̂(k) = ax(k − 1).
The prediction error is given by
E 1 = E [x̂(k) − x(k)] = E [n(k)] = 1.
64-10
64.4
Handbook of Linear Algebra
Wiener Filtering
Linear optimal filtering of stationary signals was introduced by Wiener [Wie49] and Kolmogorov [Kol41].
The material here is covered in books on adaptive filtering [Hay91] and linear estimation [KSH00].
Definitions:
The general discrete Wiener filtering problem is the following: Given stationary signals x and y find a
filter with impulse response w such that
⎡
2 ⎤
∞
E w = E ⎣
y(k) −
w ( j )x(k − j )
⎦
j =−∞
is minimal. The filter with impulse response w minimizing E w is the Wiener filter. The goal is to approximate a desired signal y by filtering the signal x. Equivalently, we seek the best approximation to y(k) in
the span of the x( j ) for −∞ < j < ∞. Note that we leave open the possibility that the Wiener filter is
not causal.
Depending on specific relations that may be imposed on y and x, the Wiener filtering problem specializes
in several important ways: The Wiener prediction problem for which y(k) = x(k +l ) where l ≥ 1 and w is
assumed causal; the Wiener smoothing problem for which y is arbitrary and x is a noise corrupted version
of y given by x = y + n where n is the noise; and the Wiener deconvolution problem for which we have
arbitrary desired signals y and x obtained from y by convolution and the addition of noise, x = h ∗ y + n.
For the Wiener prediction problem, the goal is to predict future values of x from past values. For the
Wiener smoothing problem, the goal is to recover an approximation to a signal y from a noise corrupted
version of the signal x. For the Wiener deconvolution, the goal is to invert a filter in the presence of noise
to obtain the signal y from the filtered and noise corrupted version x.
The FIR Wiener filtering problem is to choose an FIR filter with impulse response w (k) such that
⎡
2 ⎤
n−1
w ( j )x(k − j )
⎦
E ⎣
y(k) −
j =0
is minimal.
The causal part of a signal x(k) is
[x(k)]+ =
x(k)
if k ≥ 0
0
if k < 0
or
[X(z)]+ =
∞
x(k)z −k .
k=0
Facts:
1. The stationarity assumptions ensure that the coefficients w ( j ) that are used to estimate y(k) from
the signal x do not depend on k. That is, the coefficients w ( j ) that minimize E w also minimize
⎡
2 ⎤
n−1
w ( j )x(k − k0 − j )
⎦
E ⎣
y(k − k0 ) −
j =0
so that the Wiener filter is a linear time invariant system.
2. Let Sx (z) = L (z)L (z −1 ),
r yx (k) = E [y( j )x( j − k)] ,
and
S yx (e i ω ) =
∞
k=−∞
r yx (k)e i kω .
64-11
Signal Processing
If we do not require causality, then the Wiener filter has z-transform
W(z) =
S yx (z)
.
Sx x (z)
If we do impose causality, then the Wiener filter is
S yx (z)
L ∗ (z −∗ )
+
1
.
L (z)
3. As seen above the Wiener filter depends on the cross correlations r yx (k), but not on knowledge
of the elements of the unkown sequence y(k). Thus, it is possible to estimate the signal y by
filtering x without direct knowledge of y. If the autocorrelation sequences r x x (k) and r nn (k) are
available (or can be estimated), then r yx (k) can often be computed in a straightforward way. For
the Wiener prediction problem, r yx (k) = r x x (l + k). If the noise n and signal y are uncorrelated
so that r ny (k) = 0 for all k, then for the Wiener smoothing problem r yx = r x x − r nn . For the
Wiener deconvolution problem, if g is the inverse filter of h and if y and n are uncorrelated, then
r yx = g ∗ (r x x − r nn ).
4. The coefficients for an FIR Wiener filter satisfy the Wiener–Hopf equation
⎡
r x x (0)
⎢
⎢
⎢ r x x (1)
⎢
⎢
..
⎢
.
⎣
r x x (m − 1)
r x x (−1)
···
r x x (0)
..
.
..
..
.
.
···
r x x (1)
⎤
r x x (−m + 1) ⎡
⎥
..
⎥⎢
.
⎥⎢
w (0)
⎤
⎡
r yx (0)
⎤
⎢
⎥
w (1) ⎥
⎥ ⎢ r yx (1) ⎥
⎥⎢
⎥=⎢
⎥.
..
..
⎥⎢
⎥ ⎢
⎥
⎥
⎣
⎦
⎣
⎦
.
.
r x x (−1) ⎦
w (m − 1)
r yx (m − 1)
r x x (0)
As with linear prediction, this system of equations can be solved using fast algorithms for Toeplitz
systems.
Examples:
1. We consider the first order FIR Wiener prediction problem of predicting y(k) = x(k + 2) from the
autoregressive process x with x(k) = n(k) + ax(k − 1). Thus, we seek w (0) such that
E |x(k + 2) − w (0)x(k)|2
is minimal. Since
r x x (0) =
1
1 − a2
and
r yx (0) = r x x (2) =
a2
1 − a2
the Wiener–Hopf equation for w (0) is
1
a2
w (0) =
.
2
1−a
1 − a2
64-12
64.5
Handbook of Linear Algebra
Adaptive Filtering
Definitions:
An FIR adaptive filter is a filter with coefficients that are updated to match the possibly changing statistics
of the input x and the desired output y. The output is
ŷ(k) =
m−1
w k ( j )x(k − j )
j =0
where the w k ( j ) depend on k. The goal is similar to that of the Wiener filter, that is, to approximate the
sequence y. The difference is that instead of computing a fixed set of coefficients w ( j ) from knowledge of
the statistics of stationary signals x and y, the coefficients are allowed to vary with k and are computed
from actual samples of the sequences x and y.
A recursive least squares (RLS) adaptive FIR filter is a rule for updating the coefficients w k ( j ). The
coefficients are chosen to minimize
⎡
⎤⎡
⎤
⎡
⎤
x(l )
x(l − 1) · · · x(l − m + 1)
w k (0)
y(l )
⎢x(l + 1)
⎥
⎥
⎢
⎥
⎢
x(l )
· · · x(l − m + 2) ⎥ ⎢ w k (1) ⎥
⎢
⎢ y(l + 1)⎥
⎢
⎥
⎥
⎢
⎥
⎢
min Dρ,k ⎢
..
..
..
⎥⎢
⎥ − Dρ,k ⎢ .. ⎥
..
..
wk
.
.
⎣ .
⎦⎣
⎦
⎣ . ⎦
.
.
y(k) 2
w k (m − 1)
x(k)
x(k − 1) · · · x(k − m + 1)
where k ≥ l + m − 1 and Dρ,k = diag(ρ k−l , ρ k−l −1 , . . . ρ 0 ) with 0 < ρ ≤ 1. Equivalently, the vector wk
is chosen to solve minwk Dρ,k Tl ,k wk − Dρ,k yk . The parameter ρ is chosen to decrease the influence of
older data. This is of use when the signals x and y are only approximately stationary, i.e., their statistical
properties are slowly time varying.
A signal x is persistently exciting of order m if there exist c 1 > 0, c 2 ≥ c 1 , and j such that
c1 I ≤ λp
1
T T Tl ,k
k − l + 1 l ,k
≤ c2 I
for each eigenvalue λ p of Tl T,k Tl ,k /(k − l + 1) and for all k ≥ j .
Facts:
1. The vector wk can be updated using using the following recursive formulas:
wk = wk−1 + gk y(k) − wTk−1 xk
where
⎡
⎢
xk = ⎢
⎣
gk =
x(k)
..
.
⎤
⎥
⎥,
⎦
x(k − m + 1)
ρ −2 Pk−1 xk
,
1 + ρ −2 xTk Pk−1 xk
with
Pk = ρ −2 Pk−1 − ρ −2 gk xTk Pk−1 .
64-13
Signal Processing
2.
3.
4.
5.
2
2
With initialization Pk = (Tl T,k Dρ,k
Tl ,k )−1 and wk = Pk Tl T,k Dρ,k
yk , the recurrences compute the
solution wk to the recursive least squares problem.
[GV96] The weight vector wk can also be updated as k increases with an O(n2 ) cost using a standard
Q R updating algorithm. Such algorithms have good numerical properties.
If x and y are stationary, if ρ = 1, and if x is persistently exciting, then wk converges to the
Wiener filter as k → ∞. Under these assumptions the effect of initializing Pk incorrectly becomes
a negligible as k → ∞. Thus, convergence does not depend on the correct initialization of Pk =
Tl T,k Tl ,k . For simplicity, it is common to choose an initial Pk that is a multiple of the identity.
If x and y are not stationary, then ρ is typically chosen to be less than one in order to allow the filter
to adapt to changing statistics. If the signals really are stationary, then choice of ρ < 1 sacrifices
convergence of the filter.
There are a variety of algorithms that exploit the Toeplitz structure of the least squares problem,
reducing the cost per update of wk to O(n) operations [CK84]. An algorithm that behaves well in
finite precision is analyzed in [Reg93].
Examples:
1. Consider applying a recursive least squares adaptive filter to signals x and y with
x(1) = 1, x(2) = 2, x(3) = 3, x(4) = 4
and
y(2) = 5, y(3) = 6, y(4) = 8.
The vector of filter coefficients w3 satisfies
5
w3 =
,
3 1
6
2
1
so that
w3 =
4
−3
.
We also have
⎛
P3 = ⎝
T 2
1
3
1
⎞−1 13 8
⎠ =
,
3 1
8 5
2
1
so that
P3 x4
1 −4
=
.
g4 =
6 7
1 + xT4 P3 x4
Thus, the next coefficient vector w4 is
w4 = w3 + g3 y(4) −
wT3 x4
=
The process can be continued by updating P4 , g5 , and w5 .
10/3
−11/6
.
64-14
64.6
Handbook of Linear Algebra
Spectral Estimation
Definitions:
The deterministic spectral estimation problem is to estimate the spectral density Sx (e i ω ) from a finite
number of correlations r x x (k) for 0 ≤ k ≤ n. Note that this definition is not taken to imply that the signal
x is deterministic, but that the given data are correlations rather than samples of the random signals.
The stochastic spectral estimation problem is to estimate S(e i ω ) from samples of the random signal
x(k) for 0 ≤ k ≤ n.
The problem of estimating sinusoids in noise is to find the frequencies ω j of
x̂(k) = α1 cos(ω1 k + η1 ) + α2 cos(ω2 k + η2 ) + · · · + α p cos(ω p k + η p ) + w (k) = x(k) + n(k)
where n is zero mean white noise with variance σn2 and x is the exact sinusoidal signal with random
frequencies ω j ∈ [0, 2π) and phases η j . The frequencies and phases are assumed to be uncorrelated with
the noise n. The available data can be the samples of the signal x or the autocorrelation sequence r x x ( j ).
Facts:
1. [PM96] A traditional method for the deterministic spectral estimation problem is to assume that
any r x x ( j ) not given are equal to zero. When additional information is available in the form of a
signal model, it is possible to do better by extending the autocorrelation sequence in a way that is
consistent with the model.
2. [Pap85] The method of maximum entropy: The sequence r x x (k) is extended so as to maximize the
entropy rate
π
ln S(e i ω ) dω,
−π
where it is assumed that the random signal x is a sequence of normal random variables. It can be
shown that the spectrum that achieves this is the spectrum of an autoregressive signal
En
2 ,
n
1 − j =1 an ( j )e −i ωj S(e i ω ) = where the an ( j ) are the coefficients obtained from the Yule–Walker equations.
3. [Pis73] Pisarenko’s method: Given a signal comprised of sinusoids in noise, if σn = 0, then the
signal sequence x is order p predictable and the signal and its autocorrelation sequence r x x satisfy
x(k) = a(1)x(k − 1) + a(2)x(k − 2) + · · · + a( p)x(k − p),
r x x (k) = a(1)r (k − 1) + a(2)r x x (k − 2) + · · · + a( p)r x x (k − p),
T
where a p = 1 −a p (1) · · · −a p ( p) is a null vector of the ( p + 1) × ( p + 1) autocorrelation
matrix T = [ti j ] = [r x x (i − j )]. This is the Yule–Walker system with n = p and E p = 0. When
σn = 0, we have T̂ = T + σn2 I where T̂ is the autocorrelation matrix for x̂(k). The value of σn2 is
the smallest eigenvalue of T̂ . Given knowledge of σn , T can be computed from T = T̂ − σn2 I . The
prediction coefficients can then be computed from the Yule–Walker equation T a p = 0. The zeros
of the prediction error filter
!
z p E p (z) = z p
1−
p
"
a( j )z − j
j =1
all lie on the unit circle at e i ω j for j = 1, 2, . . . p, giving the frequencies of the sinusoids.
64-15
Signal Processing
4. [AGR87] The zeros of a polynomial are often sensitive to small errors in the coefficients. Computing such zeros accurately can be difficult. As an alternative the zeros of z p E p (z) can be found
as the eigenvalues of a unitary Hessenberg matrix. Let γ j and δ j be the Schur parameters and
complementary parameters for the autocorrelation matrix R. Then |γ j | < 1 for 1 ≤ j ≤ p − 1
and |γ p | = 1. The matrix
H = G 1 (γ1 )G 2 (γ2 ) · · · G p (γ p )
with
G j (γ j ) = I j −1 ⊕
−γ j
δj
δj
γj
⊕ I p− j −1
is unitary and upper Hessenberg. An explicit formula for the elements of H is
⎡
−γ1
⎢ δ
⎢ 1
⎢
⎢
H =⎢
⎢
⎢
⎢
⎣
−δ1 γ2
−δ1 δ2 γ3
···
−δ1 · · · δ p−1 γ p
−γ 1 γ2
−γ 1 δ2 γ3
···
δ2
−γ 2 γ3
−γ 1 δ2 · · · δ p−1 γ p
..
.
..
.
..
..
.
.
⎤
⎥
⎥
⎥
⎥
⎥.
⎥
⎥
⎥
⎦
−γ p−1 γ p
δ p−1
The eigenvalues of H are the zeros of z p E p (z). The eigenvalues of a p × p unitary Hessenberg
matrix can be found with O( p 2 ) operations. An algorithm exploiting connections with Szegö
polynomials was given in [Gra86] and stabilized in [Gra99]. A different, stable algorithm involving
matrix pencils was given in [BE91].
Examples:
1. Consider a stationary random signal with known autocorrelations r x x (0) = 1 and r x x (1) = a < 1.
The other autocorrelations are assumed to be unknown. The Yule–Walker system
1
a
a
1
1
−a1 (1)
=
E1
0
has solution a1 (1) = a and E 1 = 1 − a 2 . Thus, the maximum entropy estimate of the spectral
density Sx (e i ω ) is
1 − a2
.
1 − ae −i ω S(e i ω ) = 64.7
Direction of Arrival Estimation
Definitions:
The direction of arrival estimation problem can be described as follows. We assume that n plane waves
s j (k), 1 ≤ j ≤ n arrive at an array of m sensors from angles θ j resulting in output xl (k), 1 ≤ l ≤ m from
the sensors. We assume that the sensors lie in a plane and that it is necessary to estimate only a single angle
of arrival for each signal. It is assumed that each sensor output is corrupted by noise nl (k), 1 ≤ l ≤ m.
64-16
Handbook of Linear Algebra
The sensors experience different delays and attenuations of the signal s j (k) depending on their position
and the direction of arrival. If the signals are narrow-band signals
s j (k) = Re u j (k)e i (ω0 k+v j (k))
for slowly varying u j (k) and v j (k), then a signal delayed by l is approximately
s j (k − l ) = Re u j (k − l )e i (ω0 k+v j (k−l )) e −i ω0 l ≈ Re u j (k)e i (ω0 k+v j (k)) e −i ω0 l .
Thus, both delay and attenuation of the signals can be described by multiplication by a complex scalar
dependent on θ j . The observed signals are, therefore,
x(k) =
n
a(θ j )s j (k) + n(k),
j =1
where
⎡
⎤
x1 (k)
⎢ . ⎥
⎥
x(k) = ⎢
⎣ .. ⎦
xm (k)
and a(θ) is the complex array response vector. In matrix notation the signal model is
x(k) = As(k) + n(k),
where
A = a(θ1 )
a(θ2 )
···
a(θn ) .
The problem is to estimate the angles θ j from a sequence x(k) of sensor output vectors. We assume that
the components of the noise vector n(k) are uncorrelated, stationary, and zero mean. A more complete
description of the problem is given in [RK89].
The array manifold is the set of all a(θ) for θ in [−π, π ].
The signal subspace is the range space of the matrix A.
The noise subspace is the orthogonal complement of the signal subspace.
We define the spatial correlation matrices
Rx = E [xx∗ ],
Rs = E [ss∗ ],
and
Rn = E [nn∗ ]
where we assume Rs is positive definite. These definitions apply directly when dealing with stationary
random signals. In the deterministic case they can be replaced with time averages.
Facts:
1. If the signal and noise are uncorrelated, then
Rx = ARs A∗ + Rn .
2. The signal model implies that except for the effect of noise the observation vectors x(k) lie in the
signal subspace. Thus, in the absence of noise, any n linearly independent observation vectors x(k)
will give a basis for the signal subspace. The directions θ j are determined by the signal subspace so
long as the mapping from the set of angles θ j to the signal subspace is invertible. We assume that
the sensor array is designed to ensure invertibility.
64-17
Signal Processing
3. The signal subspace can be estimated by solving a generalized eigenvalue problem. The n eigenvectors
associated with the n largest generalized eigenvalues of the pencil
R x − λRn
are the basis for the estimated signal subspace. This method can be used to obtain matrices X s and
X n such that X s X s∗ is a projection for the signal subspace and X n X n∗ is a projection for the noise
subspace.
4. [Sch79] In order to obtain the angles θ j the MUSIC algorithm searches the array manifold to find
angles which are close to the signal subspace in the sense that
M(θ) =
a∗ (θ)a(θ)
a∗ (θ)X n X n∗ a(θ)
is large. The values of θ for which M(θ) has a peak are taken as the estimates of the angles θ j ,
1 ≤ j ≤ n. The basic approach of the MUSIC algorithm can be used for a variety of parameter
estimation problems, including the problem of finding sinusoids in noise.
5. [RK89] The ESPRIT algorithm imposes a structure on the array of sensors. It is assumed that
the array is divided into two identical subarrays displaced along an R2 vector ∆. Let x j (k) be the
output of sensor j in the first subarray and y j (k) be the output of the corresponding sensor in
the second subarray. The displacement ∆ causes a delay in the signal x j (k) relative to y j (k). If the
signals arriving at the array are again narrow band signals centered at frequency ω0 , then the signal
model is
x(k) =
n
s j (k)a(θ j ) + nx (k),
j =1
y(k) =
n
s j (k)e i ω0 2 sin(θ j )/c a(θ j ) + n y (k),
j =1
where θk is the angle of arrival relative to the subarray displacement vector. The combined ESPRIT
signal model including both subarrays is
x(k)
y(k)
=
A
A
s(k) +
nx (k)
n y (k)
,
where the diagonal matrix = diag(e i ω0 2 sin(θ1 )/c , . . . , e i ω0 2 sin(θn )/c ) characterizes the delay
between the subarrays.
6. To compute the θ j using the ESPRIT algorithm, we start with the signal subspace associated with
the ESPRIT signal model estimated as before using a generalized eigenvalue problem. We assume
that a 2m × n matrix S with columns spanning the signal subspace
Sx
S=
Sy
has been computed and compute the 2n × 2n matrix
V=
V11
V12
V21
V22
64-18
Handbook of Linear Algebra
of right singular vectors of Sx S y . Here, V11 and V22 are n × n. The ESPRIT estimates of the
values e i ω0 2 sin(θk )/c are the eigenvalues λ j of −V12 V22−1 so that the estimate θ̂ j of θ j is
θ̂ j = sin−1 (c arg(λ j )/(
2 ω0 ))
for j = 1, 2, . . . , n.
Examples:
1. We consider a single signal arriving at an array of two sensors with array response vector
a(θ) =
e −i (π/2−θ)
1
.
If observations of the sensor outputs and solution of the generalized eigenvalue problem R x − λRn
suggest that
√ √
√ i/ 3 ∗
Xs Xs =
√
−i / 3 2/ 3
2/ 3
is the projection for the signal subspace, then the projection for the noise subspace is
√ √ −2i / 3 √
T
Xn Xn =
√
2i / 3 1/ 3 .
1/ 3
In applying the MUSIC algorithm, we have
M(θ) =
2
.
3|2e i θ + 1|2
This function has a maximum of 2/3 at θ = π. Thus, the MUSIC estimate of the angle from which
the signal arrives is θ = π.
References
[AGR87] G. Ammar, W. Gragg, and L. Reichel, Determination of Pisarenko frequency estimates as eigenvalues of an orthogonal matrix, in Advanced Algorithms and Architectures for Signal. Processing II,
Proc. SPIE, Vol. 826, Int. Soc. Opt. Eng., pp. 143–145, 1987.
[BBH95] A. Bojanczyk, R.P. Brent, F.R. De Hoog, and D.R. Sweet, On the stability of the Bareiss and related
Toeplitz factorization algorithms, SIAM J. Matrix Anal. Appl., 16: 40–58, 1995.
[BE91] A. Bunse-Gerstner and L. Elsner, Schur parameter pencils for the solution of the unitary eigenproblem, Lin. Alg. Appl., 154–156:741–778, 1991.
[CK84] J. Cioffi and T. Kailath, Fast, recursive least-squares filters for adaptive filtering, IEEE Trans. Acoust.,
Speech, Sig. Proc., 32: 304–337, 1984.
[Cyb80] G. Cybenko, The numerical stability of the Levinson–Durbin Algorithm for Toeplitz systems of
equations, SIAM J. Sci. Stat. Comp., 1:303–310, 1980.
[GV96] G.H. Golub and C.F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press,
Baltimore, 1996.
[Gra86] W.B. Gragg, The QR algorithm for unitary Hessenberg matrices, J. Comp. Appl. Math., 16:1–8,
1986.
[Gra93] W.B. Gragg, Positive definite Toeplitz matrices, the Arnoldi process for isometric operators, and
Gaussian quadrature on the unit circle, J. Comp. Appl. Math., 46:183–198, 1993.
[Gra99] W.B. Gragg, Stabilization of the UHQR algorithm, In Z. Chen, Y. Li, C. Micchelli, and Y. Xu, Eds.,
Advances in Computational Mathematics, pp. 139–154. Marcel Dekker, New York, 1999.
Signal Processing
64-19
[GS84] U. Grenander and G. Szegö, Toeplitz Forms and Their Applications. 2nd ed., Chelsea, London,
1984.
[Hay91] S. Haykin, Adaptive Filter Theory, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1991.
[KSH00] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation, Prentice Hall, Upper Saddle River, NJ,
2000.
[Kol41] A.N. Kolmogorov, Interpolation and extrapolation of stationary random sequences, Izv. Akad.
Nauk SSSR Ser. Mat. 5:3–14, 1941, (in Russian).
[Pap85] A. Papoulis, Levinson’s algorithm Wold’s decomposition, and spectral estimation, SIAM Rev.,
27:405–441, 1985.
[Pis73] V.F. Pisarenko, The retrieval of harmonics from a covariance function, Geophys. J. Roy. Astron. Soc.,
33:347–366, 1973.
[PM96] J.G. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications,
Prentice Hall, Upper Saddle River, NJ, 1996.
[Reg93] P. Regalia, Numerical stability properties of a QR-based fast least-squares algorithm, IEEE Trans.
Sig. Proc., 41:2096–2109, 1993.
[RK89] R. Roy and T. Kailath, ESPRIT — Estimation of signal parameters via rotational invariance
techniques, IEEE Trans. Acoust., Speech, Sign. Proc., 37:984–995, 1989.
[Sch79] R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Anten. Prop.,
34:276–280, 1979.
[SK01] A.H. Sayed and T. Kailath, A survey of spectral factorization methods, Num. Lin. Alg. Appl.,
08:467–496, 2001.
[Wie49] N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering
Applications, MIT Press, Cambridge, MA, 1949.