5 Chapter 5 Inner Product Spaces Orthogonal Projection Least Squares and Singular Value Decomposition
by taratuta
Comments
Transcript
5 Chapter 5 Inner Product Spaces Orthogonal Projection Least Squares and Singular Value Decomposition
5 Inner Product Spaces, Orthogonal Projection, Least Squares, and Singular Value Decomposition Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adjoints of Linear Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Gram−Schmidt Orthogonalization and QR Factorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Pseudo-Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Least Squares Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 5.2 5.3 Lixing Han University of Michigan-Flint Michael Neumann University of Connecticut 5.1 5-1 5-3 5-5 5-6 5-8 5-10 5-12 5-14 5-16 Inner Product Spaces Definitions: Let V be a vector space over the field F , where F = R or F = C. An inner product on V is a function ·, ·: V × V → F such that for all u, v, w ∈ V and a, b ∈ F , the following hold: r v, v ≥ 0 and v, v = 0 if and only if v = 0. r au + bv, w = au, w + bv, w. r For F = R: u, v = v, u; For F = C: u, v = v, u (where bar denotes complex conjugation). A real (or complex) inner product space is a vector space V over R (or C), together with an inner product defined on it. √ In an inner product space V , the norm, or length, of a vector v ∈ V is v = v, v. A vector v ∈ V is a unit vector if v = 1. The angle between two nonzero vectors u and v in a real inner product space is the real number θ, 0 ≤ θ ≤ π, such that u, v = uv cos θ. See the Cauchy–Schwarz inequality (Fact 9 below). Let V be an inner product space. The distance between two vectors u and v is d(u, v) = u − v. 5-1 5-2 Handbook of Linear Algebra A Hermitian matrix A is positive definite if x∗ Ax > 0 for all nonzero x ∈ Cn . (See Chapter 8 for more information on positive definite matrices.) Facts: All the following facts except those with a specific reference can be found in [Rom92, pp. 157–164]. 1. The vector space Rn is an inner product space under the standard inner product, or dot product, defined by u, v = uT v = n ui v i . i =1 This inner product space is often called n–dimensional Euclidean space. 2. The vector space Cn is an inner product space under the standard inner product, defined by u, v = v∗ u = n ui v̄ i . i =1 This inner product space is often called n-dimensional unitary space. 3. [HJ85, p. 410] In Rn , a function ·, ·: Rn × Rn → R is an inner product if and only if there exists a real symmetric positive definite matrix G such that u, v = uT G v, for all u, v ∈ Rn . 4. [HJ85, p. 410] In Cn , a function ·, ·: Cn × Cn → C is an inner product if and only if there exists a Hermitian positive definite matrix H such that u, v = v∗ Hu, for all u, v ∈ Cn . 5. Let l 2 be the vector space of all infinite complex sequences v = (v n ) with the property that ∞ |v n |2 < ∞. Then l 2 is an inner product space under the inner product n=1 u, v = ∞ un v̄ n . n=1 6. The vector space C [a, b] of all continuous real-valued functions on the closed interval [a, b] is an inner product space under the inner product f, g = b f (x)g (x)d x. a 7. If V is an inner product space and u, w = v, w for all w ∈ V , then u = v. 8. The inner product on an inner product space V , when restricted to vectors in a subspace S of V , is an inner product on S. 9. Let V be an inner product space. Then the norm function · on V has the following basic properties for all u, v ∈ V : r v ≥ 0 and v = 0 if and only if v = 0. r av = |a|v, for all a ∈ F . r (The triangle inequality) u + v ≤ u + v with equality if and only if v = au, for some a ∈ F. r (The Cauchy–Schwarz inequality) |u, v| ≤ uv with equality if and only if v = au, for some a ∈ F. r |u − v| ≤ u − v. 5-3 Inner Product Spaces, Orthogonal Projection, Least Squares r (The parallelogram law) u + v2 + u − v2 = 2u2 + 2v2 . r (Polarization identities) 4u, v = ⎧ ⎨u + v2 − u − v2 , if F = R. ⎩u + v2 − u − v2 + i u + i v2 − i u − i v2 , if F = C. Examples: 1. Let R4 be the Euclidean space with the inner product u, v = uT v. Let x = [1, 2, 3, 4]T ∈ R4 and y = [3, −1, 0, 2]T ∈ R4 be two vectors. Then √ √ r x, y = 9, x = 30, and y = 14. √ r The distance between x and y is d(x, y) = x − y = 26. 9 r The angle between x and y is θ = arccos √ 9√ = arccos √ ≈ 1.116 radians. 30 14 2 105 2. u, v = u1 v 1 + 2u1 v 2 + 2u2 v 1 + 6u2 v 2 = uT G= 1 2 2 v is an inner product on R2 , as the matrix 6 1 2 is symmetric positive definite. 2 6 3. Let C [−1, 1] be the vector space with the inner product f, g = and g (x) = x 2 be two functions in C [−1, 1]. Then f, g = and g , g = 1 −1 1 −1 1 −1 f (x)g (x)d x and let f (x) = 1 x 2 d x = 2/3, f, f = 1 −1 1d x = 2, √ x 4 d x = 2/5. The angle between f and g is arccos( 5/3) ≈ 0.730 radians. 4. [Mey00, p. 286] A, B = tr(AB ∗ ) is an inner product on Cm×n . 5.2 Orthogonality Definitions: Let V be an inner product space. Two vectors u, v ∈ V are orthogonal if u, v = 0, and this is denoted by u ⊥ v. A subset S of an inner product space V is an orthogonal set if u ⊥ v, for all u, v ∈ S such that u = v. A subset S of an inner product space V is an orthonormal set if S is an orthogonal set and each v ∈ S is a unit vector. Two subsets S and W of an inner product space V are orthogonal if u ⊥ v, for all u ∈ S and v ∈ W, and this is denoted by S ⊥ W. The orthogonal complement of a subset S of an inner product space V is S ⊥ = {w ∈ V |w, v = 0 for all v ∈ S}. A complete orthonormal set M in an inner product space V is an orthonormal set of vectors in V such that for v ∈ V , v ⊥ M implies that v = 0. An orthogonal basis for an inner product space V is an orthogonal set that is also a basis for V . An orthonormal basis for V is an orthonormal set that is also a basis for V . A matrix U is unitary if U ∗ U = I . A real matrix Q is orthogonal if Q T Q = I . 5-4 Handbook of Linear Algebra Facts: 1. [Mey00, p. 298] An orthogonal set of nonzero vectors is linearly independent. An orthonormal set of vectors is linearly independent. 2. [Rom92, p. 164] If S is a subset of an inner product space V , then S ⊥ is a subspace of V . Moreover, if S is a subspace of V , then S ∩ S ⊥ = {0}. 3. [Mey00, p. 409] In an inner product space V , {0}⊥ = V and V ⊥ = {0}. 4. [Rom92, p. 168] If S is a finite dimensional subspace of an inner product space V , then for any v ∈ V, r There are unique vectors s ∈ S and t ∈ S ⊥ such that v = s + t. This implies V = S ⊕ S ⊥ . r There is a unique linear operator P such that P (v) = s. 5. [Mey00, p. 404] If S is a subspace of an n−dimensional inner product space V , then r (S ⊥ )⊥ = S. r dim(S ⊥ ) = n − dim(S). 6. [Rom92, p. 174] If S is a subspace of an infinite dimensional inner product space, then S ⊆ (S ⊥ )⊥ , but the two sets need not be equal. 7. [Rom92, p. 166] An orthonormal basis is a complete orthonormal set. 8. [Rom92, p. 166] In a finite-dimensional inner product space, a complete orthonormal set is a basis. 9. [Rom92, p. 165] In an infinite-dimensional inner product space, a complete orthonormal set may not be a basis. 10. [Rom92, p. 166] Every finite-dimensional inner product space has an orthonormal basis. 11. [Mey00, p. 299] Let B = {u1 , u2 , . . . , un } be an orthonormal basis for V . Every vector v ∈ V can be uniquely expressed as v= n v, ui ui . i =1 The expression on the right is called the Fourier expansion of v with respect to B and the scalars v, ui are called the Fourier coefficients. 12. [Mey00, p. 305] (Pythagorean Theorem) If {vi }ik=1 is an orthogonal set of vectors in V , then ik=1 vi 2 = ik=1 vi 2 . 13. [Rom92, p. 167] (Bessel’s Inequality) If {ui }ik=1 is an orthonormal set of vectors in V , then v2 ≥ ik=1 |v, ui |2 . 14. [Mey00, p. 305] (Parseval’s Identity) Let B = {u1 , u2 , . . . , un } be an orthonormal basis for V . Then for each v ∈ V , v2 = n |v, ui |2 . i =1 m×n 15. [Mey00, p. 405] Let A ∈ F , where F = R or C. Then r ker(A)⊥ = range(A∗ ), range(A)⊥ = ker(A∗ ). r F m = range(A) ⊕ range(A)⊥ = range(A) ⊕ ker(A∗ ). r F n = ker(A) ⊕ ker(A)⊥ = ker(A) ⊕ range(A∗ ). 16. [Mey00, p. 321] (See also Section 7.1.) The following statements for a real matrix Q ∈ Rn×n are equivalent: r Q is orthogonal. r Q has orthonormal columns. r Q has orthonormal rows. r Q Q T = I , where I is the identity matrix of order n. r For all v ∈ Rn , Qv = v. 5-5 Inner Product Spaces, Orthogonal Projection, Least Squares 17. [Mey00, p. 321] (See also Section 7.1.) The following statements for a complex matrix U ∈ Cn×n are equivalent: r U is unitary. r U has orthonormal columns. r U has orthonormal rows. r U U ∗ = I , where I is the identity matrix of order n. r For all v ∈ Cn , U v = v. Examples: 1. Let C [−1, 1] be the vector space with the inner product f, g = and g (x) = x be two functions in C [−1, 1]. Then f, g = 1 −1 1 −1 f (x)g (x)d x and let f (x) = 1 xd x = 0. Thus, f ⊥ g . 2. The standard basis {e1 , e2 , . . . , en } is an orthonormal basis for the unitary space Cn . 3. If {v1 , v2 , · · · , vn } is an orthogonal basis for Cn and S = span {v1 , v2 , · · · , vk } (1 ≤ k ≤ n − 1), then S ⊥ = span {vk+1 , · · · , vn }. They 4. The vectors v1 = [2, 2, 1]T , v2 = [1, −1, 0]T , and v3 = [−1, −1, 4]T are mutually √ orthogonal. √ T T = v /v = [2/3, 2/3, 1/3] , u = v /v = [1/ 2, −1/ 2, 0] , and can be normalized to u√ 1 1 √1 2 2 2 √ u3 = v3 /v3 = [− 2/6, − 2/6, 2 2/3]T . The set B = {u1 , u2 , u3 } forms an orthonormal basis for the Euclidean space R3 . r If v = [v , v , v ]T ∈ R3 , then v = v, u u + v, u u + v, u u , that is, 1 2 3 1 1 2 2 3 3 v= 2v 1 + 2v 2 + v 3 v1 − v2 −v 1 − v 2 + 4v 3 √ u3 . u1 + √ u2 + 3 2 3 2 r The matrix Q = [u , u , u ] ∈ R3×3 is an orthogonal matrix. 1 2 3 5. Let S be the subspace of C3 spanned by the vectors u = [i, 1, 1]T and v = [1, i, 1]T . Then the orthogonal complement of S is S ⊥ = {w|w = α[1, 1, −1 + i ]T , where α ∈ C}. 6. Consider the inner product space l 2 from Fact 5 in Section 5.1. Let E = {ei |i = 1, 2, . . .}, where ei has a 1 on i th place and 0s elsewhere. It is clear that E is an orthonormal set. If v = (v n ) ⊥ E, then for each n, v n = v, en = 0. This implies v = 0. Therefore, E is a complete orthonormal set. However, E is not a basis for l 2 as S = span{E} = l 2 . Further, S ⊥ = {0}. Thus, (S ⊥ )⊥ = l 2 ⊆ S and l 2 = S ⊕ S ⊥ . 5.3 Adjoints of Linear Operators on Inner Product Spaces Let V be a finite dimensional (real or complex) inner product space and let T be a linear operator on V . Definitions: A linear operator T ∗ on V is called the adjoint of T if T (u), v = u, T ∗ (v) for all u, v ∈ V . The linear operator T is self-adjoint, or Hermitian, if T = T ∗ ; T is unitary if T ∗ T = I V . 5-6 Handbook of Linear Algebra Facts: The following facts can be found in [HK71]. 1. Let f be a linear functional on V . Then there exists a unique v ∈ V such that f (w) = w, v for all w ∈ V . 2. The adjoint T ∗ of T exists and is unique. 3. Let B = (u1 , u2 , . . . , un ) be an ordered, orthonormal basis of V . Let A = [T ]B . Then ai j = T (uj ), ui , i, j = 1, 2, . . . , n. Moreover, [T ∗ ]B = A∗ , the Hermitian adjoint of A. 4. (Properties of the adjoint operator) (a) (T ∗ )∗ = T for every linear operator T on V . (b) (aT )∗ = āT ∗ for every linear operator T on V and every a ∈ F . (c) (T + T1 )∗ = T ∗ + T1 ∗ for every linear operators T, T1 on V . (d) (T T1 )∗ = T1 ∗ T ∗ for every linear operators T, T1 on V . 5. Let B be an ordered orthonormal basis of V and let A = [T ]B . Then (a) T is self-adjoint if and only if A is a Hermitian matrix. (b) T is unitary if and only if A is a unitary matrix. Examples: 1. Consider the space R3 equipped with the standard inner product and let f (w) = 3w 1 − 2w 3 . Then with v = [3, 0, −2]T , f (w) = w, v. ⎡ ⎤ x ⎢ ⎥ 2. Consider the space R3 equipped with the standard inner product. Let v = ⎣ y ⎦ and T (v) = z ⎡ ⎤ ⎡ 2x + y 2 ⎢ ⎥ ⎢ ⎣ y − 3z ⎦. Then [T ] = ⎣0 x+y+z 1 1 1 1 ⎤ ⎡ 0 2 ⎥ ⎢ −3⎦ , so [T ]∗ = ⎣1 1 0 0 1 −3 ⎤ ⎡ ⎤ 1 2x + z ⎥ ⎢ ⎥ 1⎦, and T ∗ (v) = ⎣ x + y + z ⎦. 1 −3y + z 3. Consider the space Cn×n equipped with the inner product in Example 4 of section 5.1. Let A, B ∈ Cn×n and let T be the linear operator on Cn×n defined by T (X) = AX + X B, X ∈ Cn×n . Then T ∗ (X) = A∗ X + X B ∗ , X ∈ Cn×n . 4. Let V be an inner product space and let T be a linear operator on V . For a fixed u ∈ V , f (w) = T (w), u is a linear functional. By Fact 1, there is a unique vector v such that f (w) = w, v. Then T ∗ (u) = v. 5.4 Orthogonal Projection Definitions: Let S be a finite-dimensional subspace of an inner product space V . Then according to Fact 4 in Section 5.2, each v ∈ V can be written uniquely as v = s + t, where s ∈ S and t ∈ S ⊥ . The vector s is called the orthogonal projection of v onto S and is often written as Proj S v, where the linear operator Proj S is called the orthogonal projection onto S along S ⊥ . When V = Cn or V = Rn with the standard inner product, the linear operator Proj S is often identified with its standard matrix [Proj S ] and Proj S is used to denote both the operator and the matrix. Inner Product Spaces, Orthogonal Projection, Least Squares 5-7 Facts: 1. An orthogonal projection is a projection (as defined in Section 3.6). 2. [Mey00, p. 433] Suppose that P is a projection. The following statements are equivalent: r P is an orthogonal projection. r P∗ = P. r range(P ) ⊥ ker(P ). 3. [Mey00, p. 430] If S is a subspace of a finite dimensional inner product space V , then Proj S ⊥ = I − Proj S . 4. [Mey00, p. 430] Let S be a p–dimensional subspace of the standard inner product space Cn , and let the columns of matrices M ∈ Cn× p and N ∈ Cn×(n− p) be bases for S and S ⊥ , respectively. Then the orthogonal projections onto S and S ⊥ are Proj S = M(M ∗ M)−1 M ∗ and Proj S ⊥ = N(N ∗ N)−1 N ∗ . If M and N contain orthonormal bases for S and S ⊥ , then Proj S = M M ∗ and Proj S ⊥ = N N ∗ . 5. [Lay03, p. 399] If {u1 , . . . , u p } is an orthonormal basis for a subspace S of Cn , then for any v ∈ Cn , Proj S v = (u∗1 v)u1 + · · · + (u∗p v)u p . 6. [TB97, p. 46] Let v ∈ Cn be a nonzero vector. Then ∗ r Proj = vv is the orthogonal projection onto the line L = span{v}. v ∗ v v ∗ r Proj = I − vv is the orthogonal projection onto L ⊥ . ⊥v v∗ v 7. [Mey00, p. 435] (The Best Approximation Theorem) Let S be a finite dimensional subspace of an inner product space V and let b be a vector in V . Then Proj S b is the unique vector in S that is closest to b in the sense that min b − s = b − Proj S b. s∈S The vector Proj S b is called the best approximation to b by the elements of S. Examples: 1. Generally, an orthogonal projection P ∈ Cn×n is not a unitary matrix. 2. Let {v1 , v2 , · · · , vn } be an orthogonal basis for Rn and let S the subspace of Rn spanned by {v1 , · · · , vk }, where 1 ≤ k ≤ n − 1. Then w = c 1 v1 + c 2 v2 + · · · + c n vn ∈ Rn can be written as w = s + t, where s = c 1 v1 + · · · + c k vk ∈ S and t = c k+1 vk+1 + · · · + c n vn ∈ S ⊥ . 3. Let u1 = [2/3, 2/3, 1/3]T , u2 = [1/3, −2/3, 2/3]T , and x = [2, 3, 5]T . Then {u1 , u2 } is an orthonormal basis for the subspace S = span {u1 , u2 } of R3 . r The orthogonal projection of x onto S is Proj S x = u1T xu1 + u2T x u2 = [4, 2, 3]T . r The orthogonal projection of x onto S ⊥ is y = x − Proj x = [−2, 1, 2]T . S r The vector in S that is closest to x is Proj x = [4, 2, 3]T . S 5-8 Handbook of Linear Algebra r Let M = [u , u ]. Then the orthogonal projection onto S is 1 2 ⎡ 5 1⎢ Proj S = M M T = ⎣2 9 4 ⎤ 2 4 ⎥ 8 −2⎦ . −2 5 r The orthogonal projection of any v ∈ R3 onto S can be computed by Proj v = M M T v. In S particular, M M T x = [4, 2, 3]T . 4. Let w1 = [1, 1, 0]T and w2 = [1, 0, 1]T . Consider the subspace W = span{w1 , w2 } of R3 . Define ⎡ ⎤ 1 2 ⎥ T 0⎦. Then M M = 1 1 1 ⎢ the matrix M = [w1 , w2 ] = ⎣1 0 1 . 2 r The orthogonal projection onto W is Proj = M(M T M)−1 M T = W ⎡ 1 ⎢ ⎣1 0 ⎤ 1 ⎥ 2 0⎦ 1 1 1 2 −1 1 1 1 0 ⎡ 2 1⎢ 0 = ⎣1 1 3 1 1 2 −1 ⎤ 1 ⎥ −1⎦ . 2 r The orthogonal projection of any v ∈ R3 onto W can be computed by Proj v. For v = [1, 2, 3]T , W ProjW v = ProjW [1, 2, 3]T = [7/3, 2/3, 5/3]T . 5.5 Gram−Schmidt Orthogonalization and QR Factorization Definitions: Let {a1 , a2 , . . . , an } be a basis for a subspace S of an inner product space V . An orthonormal basis {u1 , u2 , . . . , un } for S can be constructed using the following Gram–Schmidt orthogonalization process: a1 u1 = a1 and uk = ak − ak − k−1 i =1 ak , ui ui k−1 i =1 ak , ui ui , for k = 2, . . . , n. A reduced QR factorization of A ∈ Cm×n (m ≥ n) is a factorization A = Q̂ R̂, where Q̂ ∈ Cm×n has orthonormal columns and R̂ ∈ Cn×n is an upper triangular matrix. A QR factorization of A ∈ Cm×n (m ≥ n) is a factorization A = Q R, where Q ∈ Cm×m is a unitary matrix and R ∈ Cm×n is an upper triangular matrix with the last m − n rows of R being zero. Facts: 1. [TB97, p. 51] Each A ∈ Cm×n (m ≥ n) has a full Q R factorization A = Q R. If A ∈ Rm×n , then both Q and R may be taken to be real. 2. [TB97, p. 52] Each A ∈ Cm×n (m ≥ n) has a reduced Q R factorization A = Q̂ R̂. If A ∈ Rm×n , then both Q̂ and R̂ may be taken to be real. 3. [TB97, p. 52] Each A ∈ Cm×n (m ≥ n) of full rank has a unique reduced Q R factorization A = Q̂ R̂, where Q̂ ∈ Cm×n and R̂ ∈ Cn×n with real r ii > 0. 4. [TB97, p. 48] The orthonormal basis {u1 , u2 , . . . , un } generated via the Gram–Schmidt orthogonalization process has the property Span({u1 , u2 , . . . , uk }) = Span({a1 , a2 , . . . , ak }), for k = 1, 2, . . . , n. Inner Product Spaces, Orthogonal Projection, Least Squares 5-9 5. [TB97, p. 51] Algorithm 1: Classical Gram–Schmidt Orthogonalization: input: a basis {a1 , a2 , . . . , an } for a subspace S output: an orthonormal basis {u1 , u2 , . . . , un } for S for j = 1 : n u j := a j for i = 1 : j − 1 r i j := a j , ui u j := u j − r i j ui end r j j := u j u j := u j /r j j end 6. [TB97, p. 58] Algorithm 2: Modified Gram–Schmidt Orthogonalization input: a basis {a1 , a2 , . . . , an } for a subspace S output: an orthonormal basis {u1 , u2 , . . . , un } for S wi := ai , i = 1 : n for i = 1 : n r ii := wi ui := wi /r ii for j = i + 1 : n r i j := w j , ui w j := w j − r i j ui end end 7. [Mey00, p. 315] If exact arithmetic is used, then Algorithms 1 and 2 generate the same orthonormal basis {u1 , u2 , . . . , un } and the same r i j , for j ≥ i . 8. [GV96, pp. 230–232] If A = [a1 , a2 , . . . , an ] ∈ Cm×n (m ≥ n) is of full rank n, then the classic or modified Gram–Schmidt process leads to a reduced QR factorization A = Q̂ R̂, with Q̂ = [u1 , u2 , . . . , un ] and R̂ i j = r i j , for j ≥ i , and R̂ i j = 0, for j < i . 9. [GV96, p. 232] The costs of Algorithm 1 and Algorithm 2 are both 2mn2 flops when applied to compute a reduced QR factorization of a matrix A ∈ Rm×n . 10. [Mey00, p. 317 and p. 349] For the QR factorization, Algorithm 1 and Algorithm 2 are not numerically stable. However, Algorithm 2 often yields better numerical results than Algorithm 1. 11. [Mey00, p. 349] Algorithm 2 is numerically stable when it is used to solve least squares problems. 12. (Numerically stable algorithms for computing the QR factorization using Householder reflections and Givens rotations are given in Chapter 38.) 13. [TB97, p. 54] (See also Chapter 38.) If A = Q R is a QR factorization of the rank n matrix A ∈ Cn×n , then the linear system Ax = b can be solved as follows: r Compute the factorization A = Q R. r Compute the vector c = Q ∗ b. r Solve Rx = c by performing back substitution. 5-10 Handbook of Linear Algebra Examples: ⎡ 1 ⎢ 1. Consider the matrix A = ⎣2 0 ⎤ 2 ⎥ 0 ⎦. 2 r A has a (full) QR factorization A = Q R: ⎡ 1 2 ⎢ ⎢2 ⎣ ⎤ ⎡ ⎥ 0⎥ ⎦= 0 √1 5 ⎢ 2 ⎢√ ⎣ 5 2 0 4 √ 3 5 − 3√2 5 √ 5 3 − 23 ⎤ ⎡√ ⎢ 1⎥ ⎥⎢ 3⎦⎣ 2 3 0 ⎤ √2 5 ⎥ √6 ⎥ . 5⎦ 0 0 5 r A has a reduced QR factorization A = Q̂ R̂: ⎡ 1 2 0 2 ⎢ ⎢2 ⎣ ⎤ ⎡ ⎥ 0⎥ ⎦= ⎡ √1 ⎢ 25 ⎢√ ⎣ 5 0 ⎤ 4 ⎡√ √ 3 5 5 ⎥ 2 ⎣ − 3√5 ⎥ ⎦ 0 √ 5 3 ⎤ √2 5⎦ . √6 5 ⎤ 3 1 −2 ⎢ 1⎥ ⎢3 −4 ⎥ 2. Consider the matrix A = ⎢ ⎥. Using the classic or modified Gram–Schmidt process ⎣3 −4 −1⎦ 3 1 0 gives the following reduced QR factorization: ⎡ ⎤ ⎡1 2 3 1 −2 1 ⎢ ⎥ ⎢ ⎢ 3 −4 1 ⎢ ⎥ ⎢2 ⎢ ⎥= 1 ⎣3 −4 −1⎦ ⎢ ⎣2 3 1 0 1 2 5.6 1 2 − 12 − 12 1 2 ⎤ − 12 ⎡ 1⎥ 6 ⎥ 2⎥⎢ 0 1⎥⎣ −2⎦ 0 1 2 ⎤ −3 −1 ⎥ 5 −1⎦ . 0 2 Singular Value Decomposition Definitions: A singular value decomposition (SVD) of a matrix A ∈ Cm×n is a factorization A = U V ∗ , = diag(σ1 , σ2 , . . . , σ p ) ∈ Rm×n , p = min{m, n}, where σ1 ≥ σ2 ≥ . . . ≥ σ p ≥ 0 and both U = [u1 , u2 , . . . , um ] ∈ Cm×m and V = [v1 , v2 , . . . , vn ] ∈ Cn×n are unitary. The diagonal entries of are called the singular values of A. The columns of U are called left singular vectors of A and the columns of V are called right singular vectors of A. Let A ∈ Cm×n with rank r ≤ p = min{m, n}. A reduced singular value decomposition (reduced SVD) of A is a factorization ˆ V̂ ∗ , ˆ = diag(σ1 , σ2 , . . . , σr ) ∈ Rr ×r , A = Û where σ1 ≥ σ2 ≥ . . . ≥ σr > 0 and the columns of Û = [u1 , u2 , . . . , ur ] ∈ Cm×r and the columns of V̂ = [v1 , v2 , . . . , vr ] ∈ Cn×r are both orthonormal. (See §8.4 and §3.7 for more information on singular value decomposition.) Inner Product Spaces, Orthogonal Projection, Least Squares 5-11 Facts: All the following facts except those with a specific reference can be found in [TB97, pp. 25–37]. 1. Every A ∈ Cm×n has a singular value decomposition A = U V ∗ . If A ∈ Rm×n , then U and V may be taken to be real. 2. The singular values of a matrix are uniquely determined. 3. If A ∈ Cm×n has a singular value decomposition A = U V ∗ , then Av j = σ j u j , A∗ u j = σ j v j , u∗j Av j = σ j , for j = 1, 2, . . . , p = min{m, n}. 4. If U V ∗ is a singular value decomposition of A, then V T U ∗ is a singular value decomposition of A∗ . 5. If A ∈ Cm×n has r nonzero singular values, then r rank(A) = r . r A= r σ j u j v∗j . j =1 r ker(A) = span{v , . . . , v }. r +1 n r range(A) = span{u , . . . , u }. 1 r 6. Any A ∈ Cm×n of rank r ≤ p = min{m, n} has a reduced singular value decomposition, ˆ = diag(σ1 , σ2 , . . . , σr ) ∈ Rr ×r , ˆ V̂ ∗ , A = Û 7. 8. 9. 10. where σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and the columns of Û = [u1 , u2 , . . . , ur ] ∈ Cm×r and the columns of V̂ = [v1 , v2 , . . . , vr ] ∈ Cn×r are both orthonormal. If A ∈ Rm×n , then Û and V̂ may be taken to be real. If rank(A) = r , then A has r nonzero singular values. The nonzero singular values of A are the square roots of the nonzero eigenvalues of A∗ A or AA∗ . [HJ85, p. 414] If U V ∗ is a singular value decomposition of A, then the columns of V are eigenvectors of A∗ A; the columns of U are eigenvectors of AA∗ . [HJ85, p. 418] Let A ∈ Cm×n and p = min{m, n}. Define 0 G= A∗ A ∈ C(m+n)×(m+n) . 0 If the singular values of A are σ1 , . . . , σ p , then the eigenvalues of G are σ1 , . . . , σ p , −σ1 , . . . , −σ p and additional |n − m| zeros. 11. If A ∈ Cn×n is Hermitian with eigenvalues λ1 , λ2 , · · · , λn , then the singular values of A are |λ1 |, |λ2 |, · · · , |λn |. 12. For A ∈ Cn×n , |det A| = σ1 σ2 · · · σn . 13. [Aut15; Sch07] (Eckart–Young Low Rank Approximation Theorem) Let A = U V ∗ be an SVD of A ∈ Cm×n and r = rank(A). For k < r , define Ak = kj =1 σ j u j v∗j . Then r A − A = k 2 r A − A = k F min A − B2 = σk+1 ; rank(B)≤k min rank(B)≤k r A − B F = σ j2 , j =k+1 n m mi2j are the 2-norm and Frobenius norm of where M2 = max ||Mx||2 and M F = ||x||2 =1 i =1 j =1 matrix M, respectively. (See Chapter 37 for more information on matrix norms.) 5-12 Handbook of Linear Algebra Examples: ⎡ ⎤ 2 1 ⎥ T and B = A = 0⎦ 2 2 1 ⎢ Consider the matrices A = ⎣2 0 5 1. The eigenvalues of A A = 2 T ⎡ 3. u1 = 1 3 Av1 = ⎡ and u2 = ⎡ u1 , u2 , and e1 produces u3 = 0 . 2 2 are 9 and 4. So, the singular values of A are 3 and 2. 8 2. Normalized eigenvectors for AT A are v1 = √ ⎤ 5 ⎢ 23 ⎥ ⎢ √ ⎥ ⎣3 5⎦ 4 √ 3 5 2 0 1 2 Av2 = ⎤ √1 5 √2 5 0 and v2 = ⎤ ⎢ √2 ⎥ ⎣ 5 ⎦. − √15 √2 5 − √15 . Application of the Gram–Schmidt process to 2 ⎢ 31 ⎥ ⎢− ⎥ . ⎣ 3⎦ − 23 4. A has the singular value decomposition A = U V T , where ⎡ 5 0 1 ⎢ U = √ ⎣2 6 3 5 4 −3 ⎡ √ ⎤ 2√5 3 ⎥ ⎢ −√5⎦ , = ⎣0 0 −2 5 ⎤ 0 1 1 ⎥ 2⎦ , V = √ 5 2 0 2 . −1 ˆ V̂ T , where 5. A has the reduced singular value decomposition A = Û ⎡ 5 1 ⎢ Û = √ ⎣2 3 5 4 ⎤ 0 3 ⎥ ˆ = 6⎦ , 0 −3 6. B has the singular value decomposition B = U B B VBT , where 1 1 U B = VA = √ 5 2 2 3 , B = −1 0 0 2 1 1 0 , V̂ = √ 2 5 2 2 . −1 ⎡ 5 1 ⎢ 0 , VB = U A = √ ⎣2 0 3 5 4 0 6 −3 √ ⎤ 2√5 ⎥ −√5⎦ . −2 5 (U A = U and VA = V for A were given in Example 4.) 5.7 Pseudo-Inverse Definitions: A Moore–Penrose pseudo-inverse of a matrix A ∈ Cm×n is a matrix A† ∈ Cn×m that satisfies the following four Penrose conditions: AA† A = A; A† AA† = A† ; (AA† )∗ = AA† ; (A† A)∗ = A† A. Facts: All the following facts except those with a specific reference can be found in [Gra83, pp. 105–141]. 1. Every A ∈ Cm×n has a unique pseudo-inverse A† . If A ∈ Rm×n , then A† is real. 5-13 Inner Product Spaces, Orthogonal Projection, Least Squares 2. [LH95, p. 38] If A ∈ Cm×n of rank r ≤ min{m, n} has an SVD A = U V ∗ , then its pseudo-inverse is A† = V † U ∗ , where † = diag(1/σ1 , . . . , 1/σr , 0, . . . , 0) ∈ Rn×m . 1 † = mn J nm , where 0mn ∈ Cm×n is the all 0s matrix and J mn ∈ Cm×n is the all 1s 3. 0†mn = 0nm and J mn matrix. 4. (A† )∗ = (A∗ )† ; (A† )† = A. 5. If A is a nonsingular square matrix, then A† = A−1 . 6. If U has orthonormal columns or orthonormal rows, then U † = U ∗ . 7. If A = A∗ and A = A2 , then A† = A. 8. A† = A∗ if and only if A∗ A is idempotent. 9. If A = A∗ , then AA† = A† A. 10. If U ∈ Cm×n is of rank n and satisfies U † = U ∗ , then U has orthonormal columns. 11. If U ∈ Cm×m and V ∈ Cn×n are unitary matrices, then (U AV )† = V ∗ A† U ∗ . 12. If A ∈ Cm×n (m ≥ n) has full rank n, then A† = (A∗ A)−1 A∗ . 13. If A ∈ Cm×n (m ≤ n) has full rank m, then A† = A∗ (AA∗ )−1 . 14. Let A ∈ Cm×n . Then r A† A, AA† , I − A† A, and I − AA† are orthogonal projections. n m r rank(A) = rank(A† ) = rank(AA† ) = rank(A† A). r rank(I − A† A) = n − rank(A). n r rank(I − AA† ) = m − rank(A). m 15. If A = A1 + A2 + · · · + Ak , Ai∗ A j = 0, and Ai A∗j = 0, for all i, j = 1, · · · , k, i = j , then † † † A† = A1 + A2 + · · · + Ak . 16. If A is an m × r matrix of rank r and B is an r × n matrix of rank r , then (AB)† = B † A† . 17. (A∗ A)† = A† (A∗ )† ; (AA∗ )† = (A∗ )† A† . 18. [Gre66] Each one of the following conditions is necessary and sufficient for ( AB)† = B † A† : r range(B B ∗ A∗ ) ⊆ range(A∗ ) and range(A∗ AB) ⊆ range(B). r A† AB B ∗ and A∗ AB B † are both Hermitian matrices. r A† AB B ∗ A∗ = B B ∗ A∗ and B B † A∗ AB = A∗ AB. r A† AB B ∗ A∗ AB B † = B B ∗ A∗ A. r A† AB = B(AB)† AB and B B † A∗ = A∗ AB(AB)† . 19. Let A ∈ Cm×n and b ∈ Cm . Then the system of equations Ax = b is consistent if and only if AA† b = b. Moreover, if Ax = b is consistent, then any solution to the system can be expressed as x = A† b + (In − A† A)y for some y ∈ Cn . Examples: ⎡ ⎤ 1 ⎢ 1. The pseudo-inverse of the matrix A = ⎣2 0 2 ⎥ 0⎦ is A† = 2 1 18 2 4 8 −2 2. (AB)† = B † A† generally does not hold. For example, if 1 A= 0 0 0 and 1 B= 0 1 , 1 −2 . 5 5-14 Handbook of Linear Algebra then 1 (AB) = 0 † 1 0 † 1 1 = 2 1 0 . 0 However, 1 B A = 0 † 5.8 † 0 . 0 Least Squares Problems Definitions: Given A ∈ F m×n (F = R or C), m ≥ n, and b ∈ F m , the least squares problem is to find an x0 ∈ F n such that b − Ax is minimized: b − Ax0 = minn b − Ax. x∈F r Such an x is called a solution to the least squares problem or a least squares solution to the linear 0 system Ax = b. r The vector r = b − Ax ∈ F m is called the residual. r If rank(A) = n, then the least squares problem is called the full rank least squares problem. r If rank(A) < n, then the least squares problem is called the rank–deficient least squares problem. The system A∗ Ax = A∗ b is called the normal equation for the least squares problem. (See Chapter 39 for more information on least squares problems.) Facts: 1. [Mey00, p. 439] Let A ∈ F m×n (F = R or C, m ≥ n) and b ∈ F m be given. Then the following statements are equivalent: r x is a solution for the least squares problem. 0 r min b − Ax = b − Ax . 0 x∈F n r Ax = P b, where P is the orthogonal projection onto range( A). 0 r A∗ r = 0, where r = b − Ax . 0 0 0 r A∗ Ax = A∗ b. 0 r x = A† b + y for some y ∈ ker(A). 0 0 0 2. [LH95, p. 36] If A ∈ F m×n (F = R or C, m ≥ n) and rank(A) = r ≤ n, then x0 = A† b is the unique solution of minimum length for the least squares problem. 3. [TB97, p. 81] If A ∈ F m×n (F = R or C, m ≥ n) has full rank, then x0 = A† b = (A∗ A)−1 A∗ b is the unique solution for the least squares problem. 4. [TB97, p. 83] Algorithm 3: Solving Full Rank Least Squares via QR Factorization input: matrix A ∈ F m×n (F = R or C, m ≥ n) with full rank n and vector b ∈ F m output : solution x0 for minx∈F n b − Ax compute the reduced QR factorization A = Q̂ R̂; compute the vector c = Q̂ ∗ b; solve R̂x0 = c using back substitution. 5-15 Inner Product Spaces, Orthogonal Projection, Least Squares 5. [TB97, p. 84] Algorithm 4: Solving Full Rank Least Squares via SVD input: matrix A ∈ F m×n (F = R or C, m ≥ n) with full rank n and vector b ∈ F m output : solution x0 for minx∈F n b − Ax ˆ = diag(σ1 , σ2 , · · · , σn ); ˆ V̂ ∗ with compute the reduced SVD A = Û ∗ compute the vector c = Û b; compute the vector y: yi = c i /σi , i = 1, 2, · · · , n; compute x0 = V̂ y. 6. [TB97, p. 82] Algorithm 5: Solving Full Rank Least Squares via Normal Equations input: matrix A ∈ F m×n (F = R or C, m ≥ n) with full rank n and vector b ∈ F m output : solution x0 for minx∈F n b − Ax compute the matrix A∗ A and the vector c = A∗ b; solve the system A∗ Ax0 = c via the Cholesky factorization. Examples: 1. Consider the inconsistent linear system Ax = b, where ⎡ ⎤ 1 ⎢ A = ⎣2 0 ⎡ ⎤ 2 1 ⎥ ⎢ ⎥ 0⎦ , b = ⎣ 2⎦ . 2 3 Then the normal equations are given by AT Ax = AT b, where 5 A A= 2 T 2 8 and A b= T 5 . 8 A least squares solution to the system Ax = b can be obtained via solving the normal equations: x0 = (A A) T −1 2/3 A b= A b= . 5/6 † T 2. We use Algorithm 3 to find a least squares solution of the system Ax = b given in Example 1. The reduced QR factorization A = Q̂ R̂ found in Example 1 in Section 5.5 gives ⎡ Q̂ b = T √1 ⎢ 25 ⎢√ ⎣ 5 ⎤ ⎡ ⎤ T 4 √ 3 5 ⎥ − 3√2 5 ⎥ √ ⎦ 5 3 √ 1 ⎢ ⎥ ⎢2⎥ = √5 . ⎣ ⎦ 5 ⎡ ⎤T ⎡ ⎤ 3 0 √ √ T Now solving R̂x = [ 5, 5] gives the least squares solution x0 = [2/3, 5/6]T . 3. We use Algorithm 4 to solve the same problem given in Example 1. Using the reduced singular ˆ V̂ T obtained in Example 5, Section 5.6, we have value decomposition A = Û 5 1 ⎢ c = Û T b = √ ⎣2 3 5 4 7 0 1 √ ⎥ ⎢ ⎥ 6⎦ ⎣2⎦ = 15 . √ 5 −3 3 5-16 Handbook of Linear Algebra Now we compute y = [y1 , y2 ]T : 7 y1 = c 1 /σ1 = √ 3 5 and 1 y2 = c 2 /σ2 = √ . 2 5 Finally, the least squares solution is obtained via ⎡ 1 1 x0 = V̂ y = √ ⎣ 5 2 ⎤⎡ ⎤ 7 √ ⎦ ⎣315⎦ √ −1 2 5 2 ⎡ =⎣ 2/3 5/6 ⎤ ⎦. References [Aut15] L. Auttone. Sur les Matrices Hypohermitiennes et sur les Matrices Unitaires. Ann. Univ. Lyon, Nouvelle Série I, Facs. 38:1–77, 1915. [Gra83] F. A. Graybill. Matrices with Applications in Statistics. 2nd ed., Wadsworth Intl. Belmont, CA, 1983. [Gre66] T. N. E. Greville. Notes on the generalized inverse of a matrix product. SIAM Review, 8:518–521, 1966. [GV96] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed., Johns Hopkins University Press, Baltimore, 1996. [Hal58] P. Halmos. Finite-Dimensional Vector Spaces. Van Nostrand, New York, 1958. [HK71] K. H. Hoffman and R. Kunze. Linear Algebra. 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1971. [HJ85] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985. [Lay03] D. Lay. Linear Algebra and Its Applications. 3rd ed., Addison Wesley, Boston, 2003. [LH95] C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. SIAM, Philadelphia, 1995. [Mey00] C. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia, 2000. [Rom92] S. Roman. Advanced Linear Algebra. Springer-Verlag, New York, 1992. [Sch07] E. Schmidt. Zur Theorie der linearen und nichtliniearen Integralgleichungen. Math Annal, 63:433476, 1907. [TB97] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, Philadelphia, 1997.