Comments
Description
Transcript
64 Chapte 64 Signal Processing
64 Signal Processing Michael Stewart Georgia State University 64.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 Random Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.3 Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.4 Wiener Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.5 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.6 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.7 Direction of Arrival Estimation . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64-1 64-4 64-7 64-10 64-12 64-14 64-15 64-18 A signal is a function of time, depending on either a continuous, real time variable or a discrete integer time variable. Signal processing is a collection of diverse mathematical, statistical, and computational techniques for transforming, extracting information from, and modeling of signals. Common signal processing techniques include filtering a signal to remove undesired frequency components, extracting a model that describes key features of a signal, estimation of the frequency components of a signal, prediction of the future behavior of a signal from past data, and finding the direction from which a signal arrives at an array of sensors. Many modern methods for signal processing have deep connections to linear algebra. Rather than attempting to give a complete overview of such a diverse field, we survey a selection of contemporary methods for the processing of discrete time signals with deep connections to matrix theory and numerical linear algebra. These topics include linear prediction, Wiener filtering, spectral estimation, and direction of arrival estimation. Notable ommissions include classical methods of filter design, the fast Fourier transform (FFT), and filter structures. (See Chapter 58.) 64.1 Basic Concepts We begin with basic definitions which are standard and can be found in [PM96]. Definitions: A signal x is a real sequence with element k of the sequence for k = 0, ±1, ±2, . . . given by x(k). The set of all signals is a vector space with the sum w = x + y defined by w (k) = x(k) + y(k) and the scalar product v = a x where a is a real scalar given by v(k) = ax(k). A linear time-invariant system is a mapping L (x) = y of an input signal x to an output signal y that satisfies the following properties: 1. Linearity: If L (x 0 ) = y 0 and L (x 1 ) = y 1 , then L (a x 0 + b y 0 ) = a y 0 + b y 1 for any a, b ∈ R. 2. Time-invariance: If L (x) = y and x̂ is a shifted version of x given by x̂(k) = x(k − k0 ), then L (x̂) = ŷ where ŷ(k) = y(k − k0 ). That is, shifted inputs lead to correspondingly shifted outputs. 64-1 64-2 Handbook of Linear Algebra The impulse response of a linear time-invariant system is the output h that results from applying as input the unit impulse δ where δ(0) = 1 and δ(k) = 0 for k = 1. The convolution y of h and x is written y = h ∗ x and is defined by the sum y(k) = ∞ h( j )x(k − j ). j =−∞ A signal x is causal if x(k) = 0 for k < 0. A linear time-invariant system is causal if every causal input gives a causal output. An equivalent definition is that a system is causal if its impulse response sequence is causal. The z-transform of a signal or impulse response x is X(z) = ∞ x(k)z −k k=−∞ where X(z) is defined in the region of convergence of the sum. The transfer function of a linear system with impulse response h is the z-transform of the impulse response H(z) = ∞ h(k)z −k . k=−∞ The discrete time Fourier transform of x is the function of ω obtained by evaluating the z-transform on the unit circle X(e i ω ) = ∞ x(k)e −i ωk . k=−∞ If x is an impulse response of a system, then X(e i ω ) is the frequency response of the system. The discrete time Fourier transform is defined only if the region of convergence of the z-transform includes the unit circle. A filter is minimum phase if the zeros of the transfer function are in the unit circle. A finite impulse response (FIR) filter is a system that maps an input signal x to an output signal y through the sum y(k) = n−1 b( j )x(k − j ). j =0 A rational infinite impulse response (IIR) filter is a causal system that maps inputs x to outputs y via a relation of the form y(k) = n−1 b( j )x(k − j ) − j =0 m a( j )y(k − j ). j =1 Causality is an essential part of the definition; without this assumption the above relation does not uniquely define a mapping from inputs to outputs. A signal x has finite energy if ∞ |x(k)|2 < ∞. k=−∞ A system mapping input x to output y is stable if |x(k)| < B x for 0 < B x < ∞ and for all k implies that there is 0 < B y < ∞ such that the output satisfies |y(k)| < B y for all k. 64-3 Signal Processing Facts: 1. The impulse response h uniquely determines a linear time-invariant system. The mapping from the input x to the output y is given by y = h ∗ x. Convolution is commutative so that if y = g ∗ (h ∗ x), then y = h ∗ (g ∗ x). Thus, the order in which two linear time-invariant filters are applied to a signal does not matter. 2. A causal system can be applied in real time in the sense that output y(k) can be computed from the convolution y = h ∗ x as soon as x(k) is available. 3. A system defined by its impulse response h is stable if and only if ∞ |h(k)| < ∞. k=−∞ 4. An expression for the z-transform of a signal (or an impulse response) does not uniquely determine the signal. It is also necessary to know the region of convergence. If X(z) is the z-transform of x, then the inverse transform is 1 2πi x(k) = X(z)z k−1 dz where the integral is taken over any contour enclosing the origin and in the region where X(z) is analytic. A system is uniquely determined by its transfer function and the region of convergence of the transfer function. 5. An FIR filter has impulse response h(k) = b(k) for 0 ≤ k ≤ n − 1 0 otherwise. 6. An FIR filter has transfer function H(z) = b(0) + b(1)z −1 + · · · + b(n − 1)z −(n−1) . 7. The impulse response h of a causal IIR filter is the unique solution to the recurrence ⎧ 0 k≤0 ⎪ ⎨ m b(k) − a( j )h(k − j ) 0≤k ≤n−1 . h(k) = j =1 ⎪ ⎩− m a( j )h(k − j ) k≥n j =1 This recurrence uniquely determines a causal impulse response h and, hence, uniquely determines the mapping from inputs to outputs. 8. A rational IIR filter has transfer function H(z) = b(0) + b(1)z −1 + · · · + b(n − 1)z −(n−1) . 1 + a(1)z −1 + · · · + a(m)z −m 9. A linear time-invariant system with input x and impulse response h with z-transform X(z) and transfer function H(z) has output y with z-transform Y (z) = H(z)X(z). 10. A discrete version of the Paley–Wiener theorem states that x is causal and has finite energy if and only if the corresponding z-transform X(z) is analytic in the exterior of the unit circle and π sup 1<r <∞ −π X(r e i ω ) 2 dω < ∞. 64-4 Handbook of Linear Algebra If |X(e i ω )| is square integrable, then |X(e i ω )| is the Fourier transform of some causal system finite energy signal if and only if π −π ln |X(e i ω )| dω < ∞. 11. For rational transfer functions H(z) corresponds to a causal stable system if and only if the poles of H(z) are in the unit circle. Thus, FIR filters are always stable and IIR filters are stable when the poles of the transfer function are in the unit circle. 12. A filter that is minimum phase has a causal stable inverse filter with transfer function G (z) = 1/H(z). That is, filtering a signal x by a filter with z-transform H(z) and then by a filter with z-transform G (z) gives an output with z-transform G (z)H(z)X(z) = X(z). The minimum phase property is of particular significance since the only filters that can be applied in real time and without excessive growth of errors due to noise are those that are causal and stable. In most circumstances only minimum phase filters are invertible in practice. Examples: 1. The causal impulse response h with h(k) = a k for k ≥ 0 and h(k) = 0 for k < 0 has z-transform H(z) = 1/(1 − az −1 ) with region of convergence |z| > |a|. The anticausal impulse response ĥ with ĥ(k) = −a k for k < 0 and ĥ(k) = 0 for k ≥ 0 has z-transform H(z) = 1/(1 − az −1 ) with region of convergence |z| < |a|. 2. The causal impulse response h with h(k) = 2−k + 3−k for k ≥ 0 0 k<0 has z-transform H(z) = 2 − 56 z −1 1 1 + = 1 − (2z)−1 1 − (3z)−1 1 − 56 z −1 + 16 z −2 with region of convergence |z| > 1/2. The impulse response can be realized by a rational IIR filter of the form 5 1 5 y(k) = 2x(k) − x(k − 1) + y(k − 1) + y(k − 2). 6 6 6 The zeros of z 2 − (5/6)z + (1/6) are 1/2 and 1/3 so that the system is stable. 64.2 Random Signals Definitions of statistical terms can be found in Chapter 52. Definitions: A random signal or random process x is a sequence x(k) of real random variables indexed by k = 0, ±1, ±2, . . .. A random signal x is wide-sense stationary (for brevity, we will use stationary to mean the same thing) if the mean µx = E [x(k)] and autocorrelation sequence r x x (k) = E [x( j )x( j − k)] do not depend on j . We assume that all random signals are wide-sense stationary. Two stationary random signals x and y are jointly stationary if the cross correlation sequence r yx (k) = E [y( j )x( j − k)] does not depend on j . When referring to two stationary signals, we always assume that the signals are jointly stationary as well. 64-5 Signal Processing A sequence x(k) of real random n × 1 vectors indexed by k = 0, ±1, ±2, . . . is stationary if E [x(k)] and E [x(k)xT (k)] do not depend on k. The autocorrelation matrix of a sequence of stationary random vectors x(k) is Rx = E x(k)xT (k) . A zero mean stationary random signal n is a white noise process if it has autocorrelation sequence r nn ( j ) = E [n(k)n(k − j )] = σn2 j =0 0 j = 0 where σn2 is the variance of n. A white noise driven autoregressive process (AR process) x is of the form x(k) = n(k) − n a( j )x(k − j ) j =1 where n is a white noise process and the filtering of n to obtain x is causal. Given a signal x, we denote the z-transform of the autocorrelation sequence of x by Sx (z). The spectral density of a stationary random signal x with autocorrelation sequence r x x (k) is the function of ω given by evaluating Sx (z) on the unit circle Sx (e i ω ) = ∞ r xx ( j )e −ijω . j =−∞ A random signal with spectral density Sx (e i ω ) is bandlimited if Sx (e i ω ) = 0 on a subset of [−π, π] of nonzero measure. A random signal with spectral density Sx (e i ω ) has finite power if π −π Sx (e i ω ) dω < ∞. The spectral factorization, when it exists, of a spectral density Sx (z) is a factorization of the form Sx (z) = L (z)L (z −1 ) where L (z) and 1/L (z) are both analytic for |z| > 1. Interpreted as transfer functions L (z) and 1/L (z) both describe causal, stable systems. That is, L (z) is causal and stable and causally and stably invertible. Facts: 1. If x(k) is a sequence of vectors obtained by sampling a random sequence of vectors, then under suitable ergodicity assumptions n 1 x(k)xT (k). n→∞ n Rx = lim k=0 In computational practice Rx is often estimated using a truncated sum in which n is finite. 2. The expectation x, y = E [xy] can be viewed as an inner product on a real Hilbert space of zero mean random variables with E [|x|2 ] < ∞. This gives many results in optimal filtering a geometric interpretation in terms of orthogonalization and projection on the span of a set of random variables in a way that is perfectly analogous to least squares problems in any other Hilbert space. From this point of view, a white noise process n is a sequence of orthogonal random variables. 64-6 Handbook of Linear Algebra 3. If a random signal x with autocorrelation sequence r x is passed through a system with impulse response h, then the output y has autocorrelation r y = h ∗ r x ∗ h̃ where h̃ is defined by h̃( j ) = h(− j ). 4. The spectral density is nonnegative, Sx (e i ω ) ≥ 0. 5. The spectral density of white noise is Sn (e i ω ) = σn2 . 6. [Pap85] If a random signal x with spectral density Sx (z) passes through a filter with transfer function H(z), then the output y is a stationary signal with S y (z) = Sx (z)H(z)H(z −1 ) S y (e i ω ) = Sx (e i ω )|H(e i ω )|2 . or 7. If L (z) is the spectral factor of Sx (z), then on the unit circle we have Sx (e i w ) = |L (e i w )|2 . 8. [SK01], [GS84] The spectral factorization exists if and only if the signal has finite power and satisfies π −π ln Sx (e i ω ) dω > −∞. If Sx (e i ω ) is zero on a set of nonzero measure, then the condition is not satisfied. Thus, bandlimited signals do not have a spectral factorization. 9. If a spectral factor L (z) exists for a signal x with spectral density Sx (e i ω ) and L (z) = ∞ l (k)z −k 1 m(k)z −k , = L (z) ∞ and M(z) = k=0 k=0 then the filtered signal n = m ∗ x given by n(k) = ∞ m( j )x(k − j ) j =0 is a white noise process with Sn (e i ω ) = 1. The original signal can be obtained by causally and stably filtering n with l: x(k) = ∞ l ( j )n(k − j ). j =0 The signal n is the innovations process for x and the representation of x through filtering n by l is the innovations representation for x. It is a model of x as a filter driven by white noise. The innovations representation is an orthogonalization of the sequence of random variables x. The innovations representation for x is analogous to the Q R factorization of a matrix. Examples: 1. For a white noise process n with variance σn2 = 1, define the autoregressive process x by x(k) = n(k) + ax(k − 1) where |a| < 1. The impulse response and transfer function of the IIR filter generating x are h(k) = k<0 0 a k k≥0 , and H(z) = 1 . 1 − az −1 64-7 Signal Processing The autocorrelation sequence for x is the sequence r xx given by 1 a |k| . 1 − a2 r xx (k) = The autocorrelation sequence has z-transform Sx (z) = 1 (1 − az −1 )(1 − az) so that the spectral density is Sx (e i ω ) = 1 . 1 + a 2 − 2a cos(ω) The spectral factor is L (z) = 64.3 1 . 1 − az −1 Linear Prediction Definitions: The linear prediction problem is to find an estimate x̂ of a stationary random signal x as a linear combination x̂(k) = ∞ a( j )x(k − j ), j =1 with the a( j ) chosen to minimize the mean square prediction error E ∞ = E [|e(k)|2 ] where e(k) = x̂(k) − x(k) is the prediction error. Thus, the goal is to find a current estimate x̂(k) of x(k) from the past values x(k − 1), x(k − 2), . . .. A random signal is predictable if E ∞ = 0. A signal that is not predictable is regular. The order n linear prediction n problem is to find an estimate of a stationary random signal x with2a signal x̂ of the form x̂(k) = j =1 a n ( j )x(k − j ) so as to minimize the mean square error E n = E [|e(k)| ] where e(k) = x̂(k) − x(k). A random signal is order n predictable if E n = 0. The prediction error filter and the order n prediction error filter are the filters with transfer functions E (z) = 1 − ∞ a( j )z − j j =1 and E n (z) = 1 − n an ( j )z − j . j =1 The Wiener–Hopf equations for the optimal prediction coefficients a( j ) are r xx (l ) = ∞ j =1 for l ≥ 1. a( j )r xx (l − j ) 64-8 Handbook of Linear Algebra Given a random signal x(k) with autocorrelation r x x (k), the optimal prediction coefficients a (n) ( j ) and mean square prediction error E n that satisfy the Yule–Walker equations are ⎡ ⎤ r x x (n) ⎡ 1 ⎤ ⎡ E n ⎤ ⎢ .. ⎥ .. ⎢ −a (1) ⎥ ⎢ 0 ⎥ ⎢r (1) r (0) n . ⎥ ⎢ ⎥ . ⎥ xx ⎢ xx ⎥⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ . ⎢ . ⎥ ⎢ ⎢ .. ⎥ ⎣ .. ⎥ .. .. ⎦ ⎣ .. ⎦ ⎣ . ⎦ . . r x x (1) −an (n) 0 r xx (n) ··· r xx (1) r xx (0) r xx (0) r xx (1) ··· or Tn an = E n e1 . The Levinson–Durbin algorithm for the computation solution of the order n linear prediction problem is an = an−1 0 + γn 0 Ran−1 where R is the permutation that reverses the order of the elements of an−1 . The quantity γn is computed from the relation γn = − r xx (n) − n−1 j =1 an−1 ( j )r xx (n − j ) E n−1 and the mean square prediction error is computed from E n = E n−1 (1 − |γn |2 ). The algorithm starts with E 0 = r xx (0) and a0 = 1. If x(k) is order n predictable, then the process terminates with the order n predictor and E n = 0. The parameters γk are known as reflection coefficients, partial correlation coefficients or Schur parameters. The complementary parameters are δk = 1 − |γk |2 . Facts: 1. [Pap85] The Wold decomposition theorem: Every stationary random signal can be represented as the sum of a regular and a predictable signal x(k) = xr (k) + x p (k). The two components are orthogonal with E [xr (k)x p (k)] = 0. 2. [Pap85] The Wiener–Hopf equations can be interpreted as stating that the prediction error x̂(k) − x(k) is orthogonal to x(k − l ) for l ≥ 1 in the inner product x, y = E [xy]. 3. The Toeplitz matrix Tn in the Yule–Walker equations is positive semidefinite. 4. The Levinson–Durbin algorithm is an O(n2 ) order recursive method for solving the linear prediction problem. In most cases the numerical results from applying the algorithm are comparable to the numerically stable Cholesky factorization procedure [Cyb80]. An alternative fast method, the Schur algorithm, computes a Cholesky factorization and can be proven to be numerically stable [BBH95]. 5. The following are equivalent: (a) Tn−1 is positive definite and Tn is positive semidefinite. (b) |γ j | < 1 for 1 ≤ j ≤ n − 1 and |γn | = 1. (c) E j > 0 for 1 ≤ j ≤ n − 1 and E n = 0. (d) The polynomial z n E n (z) has all zeros on the unit circle and each polynomial z k E k (z) for k = 1, 2, . . . , n − 1 has all zeros strictly inside the unit circle. 6. [Pap85] The prediction errors are monotonic: E k ≥ E k+1 and E k → E ∞ ≥ 0 as k → ∞. 64-9 Signal Processing 7. [SK01] Let ak (z) = 1 − ak (1)z − ak (2)z 2 − · · · − ak (k)z k where ak ( j ) are the prediction coefficients for a signal x with spectral density Sx (z). If Sx (z) = L (z)L (z −1 ) is the spectral factorization of Sx (z), then 1/2 1 Ek . k→∞ r x x (0) ak (z −1 ) L (z) = lim A survey, including other representations of the spectral factor and algorithms for computing it, is given in [SK01]. 8. [GS84] A finite power random signal x is predictable (i.e., E ∞ = 0) if and only if π −π ln(Sx (e i ω )) dω > −∞. If the signal is regular, then we have the Kolmogorov–Szegö error formula E ∞ = exp π ln S(e i ω ) dω > 0. −π Examples: 1. Consider the autoregressive process x given by x(k) = n(k) + ax(k − 1) where n is a white noise process with σn2 = 1. The solution to the order 1 linear prediction problem is given by the Yule–Walker system 1 1−a 2 a 1−a 2 a 1−a 2 1 1−a 2 1 −a1 (1) = E1 0 . The Levinson–Durbin algorithm starts with E 0 = r xx (0) = 1/(1 − a 2 ) and a0 = 1. Thus, γ1 = − r xx (1) a/(1 − a 2 ) =− = −a E0 1/(1 − a 2 ) so that 1 −a1 (1) = a1 = a0 0 1 + γ1 = . a0 −a 0 Thus, a1 (1) = a and the optimal linear prediction of x(k) from x(k − 1) is given by x̂(k) = ax(k − 1). The prediction error is given by E 1 = E [x̂(k) − x(k)] = E [n(k)] = 1. 64-10 64.4 Handbook of Linear Algebra Wiener Filtering Linear optimal filtering of stationary signals was introduced by Wiener [Wie49] and Kolmogorov [Kol41]. The material here is covered in books on adaptive filtering [Hay91] and linear estimation [KSH00]. Definitions: The general discrete Wiener filtering problem is the following: Given stationary signals x and y find a filter with impulse response w such that ⎡ 2 ⎤ ∞ E w = E ⎣ y(k) − w ( j )x(k − j ) ⎦ j =−∞ is minimal. The filter with impulse response w minimizing E w is the Wiener filter. The goal is to approximate a desired signal y by filtering the signal x. Equivalently, we seek the best approximation to y(k) in the span of the x( j ) for −∞ < j < ∞. Note that we leave open the possibility that the Wiener filter is not causal. Depending on specific relations that may be imposed on y and x, the Wiener filtering problem specializes in several important ways: The Wiener prediction problem for which y(k) = x(k +l ) where l ≥ 1 and w is assumed causal; the Wiener smoothing problem for which y is arbitrary and x is a noise corrupted version of y given by x = y + n where n is the noise; and the Wiener deconvolution problem for which we have arbitrary desired signals y and x obtained from y by convolution and the addition of noise, x = h ∗ y + n. For the Wiener prediction problem, the goal is to predict future values of x from past values. For the Wiener smoothing problem, the goal is to recover an approximation to a signal y from a noise corrupted version of the signal x. For the Wiener deconvolution, the goal is to invert a filter in the presence of noise to obtain the signal y from the filtered and noise corrupted version x. The FIR Wiener filtering problem is to choose an FIR filter with impulse response w (k) such that ⎡ 2 ⎤ n−1 w ( j )x(k − j ) ⎦ E ⎣ y(k) − j =0 is minimal. The causal part of a signal x(k) is [x(k)]+ = x(k) if k ≥ 0 0 if k < 0 or [X(z)]+ = ∞ x(k)z −k . k=0 Facts: 1. The stationarity assumptions ensure that the coefficients w ( j ) that are used to estimate y(k) from the signal x do not depend on k. That is, the coefficients w ( j ) that minimize E w also minimize ⎡ 2 ⎤ n−1 w ( j )x(k − k0 − j ) ⎦ E ⎣ y(k − k0 ) − j =0 so that the Wiener filter is a linear time invariant system. 2. Let Sx (z) = L (z)L (z −1 ), r yx (k) = E [y( j )x( j − k)] , and S yx (e i ω ) = ∞ k=−∞ r yx (k)e i kω . 64-11 Signal Processing If we do not require causality, then the Wiener filter has z-transform W(z) = S yx (z) . Sx x (z) If we do impose causality, then the Wiener filter is S yx (z) L ∗ (z −∗ ) + 1 . L (z) 3. As seen above the Wiener filter depends on the cross correlations r yx (k), but not on knowledge of the elements of the unkown sequence y(k). Thus, it is possible to estimate the signal y by filtering x without direct knowledge of y. If the autocorrelation sequences r x x (k) and r nn (k) are available (or can be estimated), then r yx (k) can often be computed in a straightforward way. For the Wiener prediction problem, r yx (k) = r x x (l + k). If the noise n and signal y are uncorrelated so that r ny (k) = 0 for all k, then for the Wiener smoothing problem r yx = r x x − r nn . For the Wiener deconvolution problem, if g is the inverse filter of h and if y and n are uncorrelated, then r yx = g ∗ (r x x − r nn ). 4. The coefficients for an FIR Wiener filter satisfy the Wiener–Hopf equation ⎡ r x x (0) ⎢ ⎢ ⎢ r x x (1) ⎢ ⎢ .. ⎢ . ⎣ r x x (m − 1) r x x (−1) ··· r x x (0) .. . .. .. . . ··· r x x (1) ⎤ r x x (−m + 1) ⎡ ⎥ .. ⎥⎢ . ⎥⎢ w (0) ⎤ ⎡ r yx (0) ⎤ ⎢ ⎥ w (1) ⎥ ⎥ ⎢ r yx (1) ⎥ ⎥⎢ ⎥=⎢ ⎥. .. .. ⎥⎢ ⎥ ⎢ ⎥ ⎥ ⎣ ⎦ ⎣ ⎦ . . r x x (−1) ⎦ w (m − 1) r yx (m − 1) r x x (0) As with linear prediction, this system of equations can be solved using fast algorithms for Toeplitz systems. Examples: 1. We consider the first order FIR Wiener prediction problem of predicting y(k) = x(k + 2) from the autoregressive process x with x(k) = n(k) + ax(k − 1). Thus, we seek w (0) such that E |x(k + 2) − w (0)x(k)|2 is minimal. Since r x x (0) = 1 1 − a2 and r yx (0) = r x x (2) = a2 1 − a2 the Wiener–Hopf equation for w (0) is 1 a2 w (0) = . 2 1−a 1 − a2 64-12 64.5 Handbook of Linear Algebra Adaptive Filtering Definitions: An FIR adaptive filter is a filter with coefficients that are updated to match the possibly changing statistics of the input x and the desired output y. The output is ŷ(k) = m−1 w k ( j )x(k − j ) j =0 where the w k ( j ) depend on k. The goal is similar to that of the Wiener filter, that is, to approximate the sequence y. The difference is that instead of computing a fixed set of coefficients w ( j ) from knowledge of the statistics of stationary signals x and y, the coefficients are allowed to vary with k and are computed from actual samples of the sequences x and y. A recursive least squares (RLS) adaptive FIR filter is a rule for updating the coefficients w k ( j ). The coefficients are chosen to minimize ⎡ ⎤⎡ ⎤ ⎡ ⎤ x(l ) x(l − 1) · · · x(l − m + 1) w k (0) y(l ) ⎢x(l + 1) ⎥ ⎥ ⎢ ⎥ ⎢ x(l ) · · · x(l − m + 2) ⎥ ⎢ w k (1) ⎥ ⎢ ⎢ y(l + 1)⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ min Dρ,k ⎢ .. .. .. ⎥⎢ ⎥ − Dρ,k ⎢ .. ⎥ .. .. wk . . ⎣ . ⎦⎣ ⎦ ⎣ . ⎦ . . y(k) 2 w k (m − 1) x(k) x(k − 1) · · · x(k − m + 1) where k ≥ l + m − 1 and Dρ,k = diag(ρ k−l , ρ k−l −1 , . . . ρ 0 ) with 0 < ρ ≤ 1. Equivalently, the vector wk is chosen to solve minwk Dρ,k Tl ,k wk − Dρ,k yk . The parameter ρ is chosen to decrease the influence of older data. This is of use when the signals x and y are only approximately stationary, i.e., their statistical properties are slowly time varying. A signal x is persistently exciting of order m if there exist c 1 > 0, c 2 ≥ c 1 , and j such that c1 I ≤ λp 1 T T Tl ,k k − l + 1 l ,k ≤ c2 I for each eigenvalue λ p of Tl T,k Tl ,k /(k − l + 1) and for all k ≥ j . Facts: 1. The vector wk can be updated using using the following recursive formulas: wk = wk−1 + gk y(k) − wTk−1 xk where ⎡ ⎢ xk = ⎢ ⎣ gk = x(k) .. . ⎤ ⎥ ⎥, ⎦ x(k − m + 1) ρ −2 Pk−1 xk , 1 + ρ −2 xTk Pk−1 xk with Pk = ρ −2 Pk−1 − ρ −2 gk xTk Pk−1 . 64-13 Signal Processing 2. 3. 4. 5. 2 2 With initialization Pk = (Tl T,k Dρ,k Tl ,k )−1 and wk = Pk Tl T,k Dρ,k yk , the recurrences compute the solution wk to the recursive least squares problem. [GV96] The weight vector wk can also be updated as k increases with an O(n2 ) cost using a standard Q R updating algorithm. Such algorithms have good numerical properties. If x and y are stationary, if ρ = 1, and if x is persistently exciting, then wk converges to the Wiener filter as k → ∞. Under these assumptions the effect of initializing Pk incorrectly becomes a negligible as k → ∞. Thus, convergence does not depend on the correct initialization of Pk = Tl T,k Tl ,k . For simplicity, it is common to choose an initial Pk that is a multiple of the identity. If x and y are not stationary, then ρ is typically chosen to be less than one in order to allow the filter to adapt to changing statistics. If the signals really are stationary, then choice of ρ < 1 sacrifices convergence of the filter. There are a variety of algorithms that exploit the Toeplitz structure of the least squares problem, reducing the cost per update of wk to O(n) operations [CK84]. An algorithm that behaves well in finite precision is analyzed in [Reg93]. Examples: 1. Consider applying a recursive least squares adaptive filter to signals x and y with x(1) = 1, x(2) = 2, x(3) = 3, x(4) = 4 and y(2) = 5, y(3) = 6, y(4) = 8. The vector of filter coefficients w3 satisfies 5 w3 = , 3 1 6 2 1 so that w3 = 4 −3 . We also have ⎛ P3 = ⎝ T 2 1 3 1 ⎞−1 13 8 ⎠ = , 3 1 8 5 2 1 so that P3 x4 1 −4 = . g4 = 6 7 1 + xT4 P3 x4 Thus, the next coefficient vector w4 is w4 = w3 + g3 y(4) − wT3 x4 = The process can be continued by updating P4 , g5 , and w5 . 10/3 −11/6 . 64-14 64.6 Handbook of Linear Algebra Spectral Estimation Definitions: The deterministic spectral estimation problem is to estimate the spectral density Sx (e i ω ) from a finite number of correlations r x x (k) for 0 ≤ k ≤ n. Note that this definition is not taken to imply that the signal x is deterministic, but that the given data are correlations rather than samples of the random signals. The stochastic spectral estimation problem is to estimate S(e i ω ) from samples of the random signal x(k) for 0 ≤ k ≤ n. The problem of estimating sinusoids in noise is to find the frequencies ω j of x̂(k) = α1 cos(ω1 k + η1 ) + α2 cos(ω2 k + η2 ) + · · · + α p cos(ω p k + η p ) + w (k) = x(k) + n(k) where n is zero mean white noise with variance σn2 and x is the exact sinusoidal signal with random frequencies ω j ∈ [0, 2π) and phases η j . The frequencies and phases are assumed to be uncorrelated with the noise n. The available data can be the samples of the signal x or the autocorrelation sequence r x x ( j ). Facts: 1. [PM96] A traditional method for the deterministic spectral estimation problem is to assume that any r x x ( j ) not given are equal to zero. When additional information is available in the form of a signal model, it is possible to do better by extending the autocorrelation sequence in a way that is consistent with the model. 2. [Pap85] The method of maximum entropy: The sequence r x x (k) is extended so as to maximize the entropy rate π ln S(e i ω ) dω, −π where it is assumed that the random signal x is a sequence of normal random variables. It can be shown that the spectrum that achieves this is the spectrum of an autoregressive signal En 2 , n 1 − j =1 an ( j )e −i ωj S(e i ω ) = where the an ( j ) are the coefficients obtained from the Yule–Walker equations. 3. [Pis73] Pisarenko’s method: Given a signal comprised of sinusoids in noise, if σn = 0, then the signal sequence x is order p predictable and the signal and its autocorrelation sequence r x x satisfy x(k) = a(1)x(k − 1) + a(2)x(k − 2) + · · · + a( p)x(k − p), r x x (k) = a(1)r (k − 1) + a(2)r x x (k − 2) + · · · + a( p)r x x (k − p), T where a p = 1 −a p (1) · · · −a p ( p) is a null vector of the ( p + 1) × ( p + 1) autocorrelation matrix T = [ti j ] = [r x x (i − j )]. This is the Yule–Walker system with n = p and E p = 0. When σn = 0, we have T̂ = T + σn2 I where T̂ is the autocorrelation matrix for x̂(k). The value of σn2 is the smallest eigenvalue of T̂ . Given knowledge of σn , T can be computed from T = T̂ − σn2 I . The prediction coefficients can then be computed from the Yule–Walker equation T a p = 0. The zeros of the prediction error filter ! z p E p (z) = z p 1− p " a( j )z − j j =1 all lie on the unit circle at e i ω j for j = 1, 2, . . . p, giving the frequencies of the sinusoids. 64-15 Signal Processing 4. [AGR87] The zeros of a polynomial are often sensitive to small errors in the coefficients. Computing such zeros accurately can be difficult. As an alternative the zeros of z p E p (z) can be found as the eigenvalues of a unitary Hessenberg matrix. Let γ j and δ j be the Schur parameters and complementary parameters for the autocorrelation matrix R. Then |γ j | < 1 for 1 ≤ j ≤ p − 1 and |γ p | = 1. The matrix H = G 1 (γ1 )G 2 (γ2 ) · · · G p (γ p ) with G j (γ j ) = I j −1 ⊕ −γ j δj δj γj ⊕ I p− j −1 is unitary and upper Hessenberg. An explicit formula for the elements of H is ⎡ −γ1 ⎢ δ ⎢ 1 ⎢ ⎢ H =⎢ ⎢ ⎢ ⎢ ⎣ −δ1 γ2 −δ1 δ2 γ3 ··· −δ1 · · · δ p−1 γ p −γ 1 γ2 −γ 1 δ2 γ3 ··· δ2 −γ 2 γ3 −γ 1 δ2 · · · δ p−1 γ p .. . .. . .. .. . . ⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎦ −γ p−1 γ p δ p−1 The eigenvalues of H are the zeros of z p E p (z). The eigenvalues of a p × p unitary Hessenberg matrix can be found with O( p 2 ) operations. An algorithm exploiting connections with Szegö polynomials was given in [Gra86] and stabilized in [Gra99]. A different, stable algorithm involving matrix pencils was given in [BE91]. Examples: 1. Consider a stationary random signal with known autocorrelations r x x (0) = 1 and r x x (1) = a < 1. The other autocorrelations are assumed to be unknown. The Yule–Walker system 1 a a 1 1 −a1 (1) = E1 0 has solution a1 (1) = a and E 1 = 1 − a 2 . Thus, the maximum entropy estimate of the spectral density Sx (e i ω ) is 1 − a2 . 1 − ae −i ω S(e i ω ) = 64.7 Direction of Arrival Estimation Definitions: The direction of arrival estimation problem can be described as follows. We assume that n plane waves s j (k), 1 ≤ j ≤ n arrive at an array of m sensors from angles θ j resulting in output xl (k), 1 ≤ l ≤ m from the sensors. We assume that the sensors lie in a plane and that it is necessary to estimate only a single angle of arrival for each signal. It is assumed that each sensor output is corrupted by noise nl (k), 1 ≤ l ≤ m. 64-16 Handbook of Linear Algebra The sensors experience different delays and attenuations of the signal s j (k) depending on their position and the direction of arrival. If the signals are narrow-band signals s j (k) = Re u j (k)e i (ω0 k+v j (k)) for slowly varying u j (k) and v j (k), then a signal delayed by l is approximately s j (k − l ) = Re u j (k − l )e i (ω0 k+v j (k−l )) e −i ω0 l ≈ Re u j (k)e i (ω0 k+v j (k)) e −i ω0 l . Thus, both delay and attenuation of the signals can be described by multiplication by a complex scalar dependent on θ j . The observed signals are, therefore, x(k) = n a(θ j )s j (k) + n(k), j =1 where ⎡ ⎤ x1 (k) ⎢ . ⎥ ⎥ x(k) = ⎢ ⎣ .. ⎦ xm (k) and a(θ) is the complex array response vector. In matrix notation the signal model is x(k) = As(k) + n(k), where A = a(θ1 ) a(θ2 ) ··· a(θn ) . The problem is to estimate the angles θ j from a sequence x(k) of sensor output vectors. We assume that the components of the noise vector n(k) are uncorrelated, stationary, and zero mean. A more complete description of the problem is given in [RK89]. The array manifold is the set of all a(θ) for θ in [−π, π ]. The signal subspace is the range space of the matrix A. The noise subspace is the orthogonal complement of the signal subspace. We define the spatial correlation matrices Rx = E [xx∗ ], Rs = E [ss∗ ], and Rn = E [nn∗ ] where we assume Rs is positive definite. These definitions apply directly when dealing with stationary random signals. In the deterministic case they can be replaced with time averages. Facts: 1. If the signal and noise are uncorrelated, then Rx = ARs A∗ + Rn . 2. The signal model implies that except for the effect of noise the observation vectors x(k) lie in the signal subspace. Thus, in the absence of noise, any n linearly independent observation vectors x(k) will give a basis for the signal subspace. The directions θ j are determined by the signal subspace so long as the mapping from the set of angles θ j to the signal subspace is invertible. We assume that the sensor array is designed to ensure invertibility. 64-17 Signal Processing 3. The signal subspace can be estimated by solving a generalized eigenvalue problem. The n eigenvectors associated with the n largest generalized eigenvalues of the pencil R x − λRn are the basis for the estimated signal subspace. This method can be used to obtain matrices X s and X n such that X s X s∗ is a projection for the signal subspace and X n X n∗ is a projection for the noise subspace. 4. [Sch79] In order to obtain the angles θ j the MUSIC algorithm searches the array manifold to find angles which are close to the signal subspace in the sense that M(θ) = a∗ (θ)a(θ) a∗ (θ)X n X n∗ a(θ) is large. The values of θ for which M(θ) has a peak are taken as the estimates of the angles θ j , 1 ≤ j ≤ n. The basic approach of the MUSIC algorithm can be used for a variety of parameter estimation problems, including the problem of finding sinusoids in noise. 5. [RK89] The ESPRIT algorithm imposes a structure on the array of sensors. It is assumed that the array is divided into two identical subarrays displaced along an R2 vector ∆. Let x j (k) be the output of sensor j in the first subarray and y j (k) be the output of the corresponding sensor in the second subarray. The displacement ∆ causes a delay in the signal x j (k) relative to y j (k). If the signals arriving at the array are again narrow band signals centered at frequency ω0 , then the signal model is x(k) = n s j (k)a(θ j ) + nx (k), j =1 y(k) = n s j (k)e i ω0 2 sin(θ j )/c a(θ j ) + n y (k), j =1 where θk is the angle of arrival relative to the subarray displacement vector. The combined ESPRIT signal model including both subarrays is x(k) y(k) = A A s(k) + nx (k) n y (k) , where the diagonal matrix = diag(e i ω0 2 sin(θ1 )/c , . . . , e i ω0 2 sin(θn )/c ) characterizes the delay between the subarrays. 6. To compute the θ j using the ESPRIT algorithm, we start with the signal subspace associated with the ESPRIT signal model estimated as before using a generalized eigenvalue problem. We assume that a 2m × n matrix S with columns spanning the signal subspace Sx S= Sy has been computed and compute the 2n × 2n matrix V= V11 V12 V21 V22 64-18 Handbook of Linear Algebra of right singular vectors of Sx S y . Here, V11 and V22 are n × n. The ESPRIT estimates of the values e i ω0 2 sin(θk )/c are the eigenvalues λ j of −V12 V22−1 so that the estimate θ̂ j of θ j is θ̂ j = sin−1 (c arg(λ j )/( 2 ω0 )) for j = 1, 2, . . . , n. Examples: 1. We consider a single signal arriving at an array of two sensors with array response vector a(θ) = e −i (π/2−θ) 1 . If observations of the sensor outputs and solution of the generalized eigenvalue problem R x − λRn suggest that √ √ √ i/ 3 ∗ Xs Xs = √ −i / 3 2/ 3 2/ 3 is the projection for the signal subspace, then the projection for the noise subspace is √ √ −2i / 3 √ T Xn Xn = √ 2i / 3 1/ 3 . 1/ 3 In applying the MUSIC algorithm, we have M(θ) = 2 . 3|2e i θ + 1|2 This function has a maximum of 2/3 at θ = π. Thus, the MUSIC estimate of the angle from which the signal arrives is θ = π. References [AGR87] G. Ammar, W. Gragg, and L. Reichel, Determination of Pisarenko frequency estimates as eigenvalues of an orthogonal matrix, in Advanced Algorithms and Architectures for Signal. Processing II, Proc. SPIE, Vol. 826, Int. Soc. Opt. Eng., pp. 143–145, 1987. [BBH95] A. Bojanczyk, R.P. Brent, F.R. De Hoog, and D.R. Sweet, On the stability of the Bareiss and related Toeplitz factorization algorithms, SIAM J. Matrix Anal. Appl., 16: 40–58, 1995. [BE91] A. Bunse-Gerstner and L. Elsner, Schur parameter pencils for the solution of the unitary eigenproblem, Lin. Alg. Appl., 154–156:741–778, 1991. [CK84] J. Cioffi and T. Kailath, Fast, recursive least-squares filters for adaptive filtering, IEEE Trans. Acoust., Speech, Sig. Proc., 32: 304–337, 1984. [Cyb80] G. Cybenko, The numerical stability of the Levinson–Durbin Algorithm for Toeplitz systems of equations, SIAM J. Sci. Stat. Comp., 1:303–310, 1980. [GV96] G.H. Golub and C.F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996. [Gra86] W.B. Gragg, The QR algorithm for unitary Hessenberg matrices, J. Comp. Appl. Math., 16:1–8, 1986. [Gra93] W.B. Gragg, Positive definite Toeplitz matrices, the Arnoldi process for isometric operators, and Gaussian quadrature on the unit circle, J. Comp. Appl. Math., 46:183–198, 1993. [Gra99] W.B. Gragg, Stabilization of the UHQR algorithm, In Z. Chen, Y. Li, C. Micchelli, and Y. Xu, Eds., Advances in Computational Mathematics, pp. 139–154. Marcel Dekker, New York, 1999. Signal Processing 64-19 [GS84] U. Grenander and G. Szegö, Toeplitz Forms and Their Applications. 2nd ed., Chelsea, London, 1984. [Hay91] S. Haykin, Adaptive Filter Theory, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1991. [KSH00] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation, Prentice Hall, Upper Saddle River, NJ, 2000. [Kol41] A.N. Kolmogorov, Interpolation and extrapolation of stationary random sequences, Izv. Akad. Nauk SSSR Ser. Mat. 5:3–14, 1941, (in Russian). [Pap85] A. Papoulis, Levinson’s algorithm Wold’s decomposition, and spectral estimation, SIAM Rev., 27:405–441, 1985. [Pis73] V.F. Pisarenko, The retrieval of harmonics from a covariance function, Geophys. J. Roy. Astron. Soc., 33:347–366, 1973. [PM96] J.G. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications, Prentice Hall, Upper Saddle River, NJ, 1996. [Reg93] P. Regalia, Numerical stability properties of a QR-based fast least-squares algorithm, IEEE Trans. Sig. Proc., 41:2096–2109, 1993. [RK89] R. Roy and T. Kailath, ESPRIT — Estimation of signal parameters via rotational invariance techniques, IEEE Trans. Acoust., Speech, Sign. Proc., 37:984–995, 1989. [Sch79] R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Anten. Prop., 34:276–280, 1979. [SK01] A.H. Sayed and T. Kailath, A survey of spectral factorization methods, Num. Lin. Alg. Appl., 08:467–496, 2001. [Wie49] N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications, MIT Press, Cambridge, MA, 1949.