39 Chapter 39 Least Squares Solution of Linear Systems

by taratuta

on 19 января 2017

Category: Documents

>> Downloads: 5

views

Report

Comments

Description

Download 39 Chapter 39 Least Squares Solution of Linear Systems

Transcript

39 Chapter 39 Least Squares Solution of Linear Systems

39
Least Squares Solution
of Linear Systems
Per Christian Hansen
Technical University of Denmark
Hans Bruun Nielsen
Technical University of Denmark
39.1
39.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-1
39.2 Least Squares Data Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-3
39.3 Geometric and Algebraic Aspects . . . . . . . . . . . . . . . . . . . . 39-4
39.4 Orthogonal Factorizations. . . . . . . . . . . . . . . . . . . . . . . . . . . 39-5
39.5 Least Squares Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-6
39.6 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-7
39.7 Up- and Downdating of QR Factorization . . . . . . . . . . . 39-8
39.8 Damped Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-9
39.9 Rank Revealing Decompositions . . . . . . . . . . . . . . . . . . . . . 39-11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39-13
Basic Concepts
(See Chapter 5 for additional information.)
Definitions:
Given a vector b ∈ Rm and a matrix A ∈ Rm×n with m > n, the least squares problem is to find a vector
x0 ∈ Rn that minimizes the Euclidean length of the difference between Ax and b:
Problem LS: Find x0 satisfying b − Ax0 2 = min b − Ax2 .
Such an x0 is called a least squares solution.
For any vector x the vector r = r(x) = b − Ax is the residual vector. The residual of a least squares
solution is denoted by r0 . The least squares problem is consistent if b ∈ range(A).
A basic solution, x0B , is a least squares solution with at least n − rank(A) zero components. The
minimum-norm least squares solution, x0M , is the least squares solution of minimum Euclidean norm.
In a weighted least squares problem, we are also given weights w i ≥ 0 for i = 1, . . . , m, and the
objective is to minimize W(b − Ax)2 , where W = diag(w 1 , . . . , w m ). This is an important special case
of the generalized least squares problem:
Problem GLS: LetAx + Bv = b, whereB ∈ Rm× p with p ≤ m.
Find xG , vG such that v2 is minimized.
Note that Bv plays the role of the residual vector.
39-1
39-2
Handbook of Linear Algebra
In the total least squares problem, we allow for errors both in the vector b and in the matrix A:
Problem TLS: Let (A + E )x + r = b, where E ∈ Rm×n with n ≤ m.
Find xT such that (E , r) F is minimized.
· F denotes the Frobenius norm.
In this chapter, Ai,: and A:, j denote the vectors given by the elements in the i th row and j th column
of matrix A, respectively. Similarly, Ak:l ,: (or A:,k:l ) is the submatrix consisting of rows (or columns) k
through l of A.
In the examples in this chapter, the computation is done with about 16 digits accuracy, but the displayed
results are rounded to fewer digits.
Facts:
(See, e.g., Chapters 1 and 2 in [Bjo96].)
1. If m > n = rank(A), then the least squares solution x0 is analytically equivalent to the solution to
the normal equations AT A x = AT b.
2. If the least squares problem is consistent, then r0 = 0.
3. The least squares solution is unique if m ≥ n and A has full rank.
4. If the system is underdetermined (m < n) or if A is rank deficient, then the solution to Problem LS
is not unique. Also, a basic solution is not unique.
5. The minimum-norm least squares solution x0M is always unique.
6. If m > n = rank(A), then the least squares solution can be written as x0 = A† b, where the matrix
A† is the Moore–Penrose generalized inverse or pseudoinverse of A. (See Section 5.7.) In general,
A† produces the minimum-norm solution: x0M = A† b.
7. If B is nonsingular, then xG minimizes B −1 (b − Ax)2 .
8. If the covariance matrix for b has the Cholesky factorization Cov(b) = C T C , then xG is the best
linear unbiased estimate (BLUE) in the general linear model with B = C T .
9. If Cov(b) has full rank, then xG is the solution to the least squares problem min C −T (b − Ax)2 .
10. In particular, if Cov(b) = σ 2 I , then xG = x0 and Cov(x0 ) = σ 2 (AT A)−1 .
11. An English translation of the original work on least squares problems by C. F. Gauss is available
in [Gau95].
Examples:
1. Consider problem LS with m = 3 and n = 2:
⎡
1
⎢
min ⎣1
1
⎤
⎤
⎡
1 0.75 ⎥ x1
⎢
⎥
− ⎣1.13⎦ .
2⎦
x2
3
1.39 2
The associated normal equations and the least squares solution are
3
6
6
14
x1
3.27
=
,
x2
7.18
The residual vector corresponding to x0 is
⎡
⎤
−0.02
⎢
⎥
r0 = r(x0 ) = b − Ax0 = ⎣ 0.04⎦ ,
−0.02
and AT r0 = 0.
0.45
x0 =
.
0.32
39-3
Least Squares Solution of Linear Systems
2. If we use the weights w 1 = 10 and w 2 = w 3 = 1, the problem is changed to
⎡
10
⎢
min ⎣ 1
1
⎤
⎡
⎤
10 7.5 ⎥ x
⎢
⎥
2 ⎦ 1 − ⎣1.13⎦
x2
3
1.39 2
whose least squares solution and corresponding residual are
x0 =
⎡
⎤
−0.00024
⎢
⎥
r0 = ⎣ 0.04790⎦ .
−0.02395
0.41838
,
0.33186
Note that the first component of r0 is reduced when w 1 is increased from 1 to 10.
39.2
Least Squares Data Fitting
Definitions:
Given m data points (ti , yi ), i = 1, . . . , m, and n linearly independent functions f j , j = 1, . . . , n (with
m > n), find the linear combination
F (x, t) =
n
x j f j (t)
j =1
that minimizes the sum of squared residuals yi − F (x, ti ) at the data points:
min
x
m
2
yi − F (x, ti ) .
i =1
The coefficients xi are the components of the least squares solution to min b − Ax2 , where the columns
A:, j of A are samples of f j at ti and the elements of b are the values yi :
Ai j = f j (ti ),
bi = yi ,
i = 1, . . . , m
j = 1, . . . , n.
The solution F (x0 , t) is said to fit the data in the least squares sense.
Facts:
(See, e.g., Chapter 4 in [Bjo96].)
1. The fit can be made more robust to outliers by solving a weighted least squares problem
min W(b − Ax)2 with W = diag(w 1 , . . . , w m ), w i = ψ(r i ) = ψ(bi − Ai,: x); ψ being a convex
function. This problem is usually solved by an iteratively reweighted least squares algorithm.
2. In orthogonal distance fitting, instead of minimizing the residuals one minimizes the orthogonal
distances between the fitting function F and the data points. Important examples are fitting of a
circle, an arc, or an ellipse to data points.
Examples:
1. Given f 1 (t) = 1 and f 2 (t) = t, find the least squares fit to the data points (1, 0.75), (2, 1.13), and
(3, 1.39). We get the A, b, and x0 from Example 1 in section 39.1, and
F (x0 , t) = 0.45 + 0.32t.
39-4
Handbook of Linear Algebra
If the third data point is changed to (3, 13.9), then the least squares solution changes to x0 =
( −7.890, 6.575)T , and the least squares fit becomes F (x0 , t) = −7.890 + 6.575t; this illustrates
the sensitivity to outliers.
39.3
Geometric and Algebraic Aspects
Definitions:
The columns of A ∈ Rm×n span the range of A, while the nullspace or kernel of A is the set of solutions
to the homogeneous system Ax = 0:
range(A) = {z = Ax | x ∈ Rn },
ker(A) = {x ∈ Rn | Ax = 0}.
The four fundamental subspaces associated with Problem LS are range(A), ker(AT ), ker(A), and
range(AT ). (See Section 2.4 for more information.)
Facts:
The first three facts can be found in [Str88, Sec. 2.4]; the remaining facts are discussed in [Bjo96, Chap. 1].
p denotes the rank of A: p = rank(A).
1. If p = n < m, then the vector 0 of all zeros is the only element in ker( A).
2. The spaces range(A) and ker(AT ) are subspaces of Rm with dimensions p and m− p, respectively.
The two spaces are orthogonal complements, i.e., yT z = 0 for any pair (y ∈ range(A), z ∈ ker(AT )),
and range(A) ⊕ ker(AT ) = Rm .
3. The spaces ker(A) and range(AT ) are subspaces of Rn with dimensions n− p and p, respectively.
The two spaces are orthogonal complements.
4. The least squares residual vector r0 = b − Ax0 is an element in ker(AT ). Combining this with the
definition of r, we get the so-called augmented system associated with Problem LS:
I
AT
A
0
r
x
b
.
0
=
If p = n, then the augmented system is nonsingular and the solution components are r0 and x0 .
5. The vector Ax0 is the orthogonal projection of b onto range(A).
6. The vector r0 is the orthogonal projection of b onto ker(AT ).
, where is a
A
7. If p < n, then the columns in A can be reordered such that A = A
has p columns, range(A) = range( A).
The permutation
permutation matrix and the submatrix A
is not unique.
8. The orthogonal projectors onto range( A) and ker(A) are given by AA† and I − A† A, respectively.
Examples:
1. The Figure 39.1 illustrates Facts 5 and 6 in the case m = 3, n = p = 2.
b> r
60
.
..
...
....
.....
......
.......
. range(A)
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . .
. . . . . . . . .
.....
. . . . . . . .
. . . . . . . . . . .
.....
. . . . . . . . . . . . . . . . . . .
.....
:
......
. . . . . . . .
.
. .
. . . . Ax
...
. . . . .
. . . . .
. .
. . . . . . . . . . . 0.
. . .
...
. .
................
. . .
. .
...................
.
.......................
........................
........................
FIGURE 39.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
...
..
.
39-5
Least Squares Solution of Linear Systems
2. For the problem in Example 1 in Section 39.1, both range( A) and ker(AT ) are subspaces of R3
given by, respectively,
⎡ ⎤
⎡ ⎤
⎡
1
1
⎢ ⎥
⎢ ⎥
range(A) = α ⎣1⎦ + β ⎣2⎦ ,
1
3
⎤
1
⎢ ⎥
ker(AT ) = γ ⎣−2⎦ ,
1
with α, β, γ ∈ R.
3. Fact 4 can be used to derive the normal equations:
⇒
r = b − Ax ∈ ker(AT )
39.4
AT (b − Ax) = 0 .
Orthogonal Factorizations
(See Section 5.5 and Section 38.4 for additional information on orthogonal factorizations.)
Definitions:
The real matrix Q is orthogonal if it is square and satisfies Q T Q = I .
A QR factorization of a matrix A ∈ Rm×n with m ≥ n has the form
A = QR = Q
R
0
R
,
= Q
∈ Rn×n is upper triangular, and Q
= Q :,1:n . The form
where Q ∈ Rm×m is orthogonal, R ∈ Rm×n , R
A = Q R is the so-called “reduced” (or “skinny”) QR factorization.
The singular value decomposition (SVD) of A ∈ Rm×n has the form
A = U VT ,
where U ∈ Rm×m and V ∈ Rn×n are orthogonal matrices and = diag(σ1 , . . . , σr ) ∈ Rm×n has diagonal
elements
σ1 ≥ σ2 ≥ · · · ≥ σr ≥ 0,
r = min{m, n} .
Letting u j and v j denote the j th column in U and V , respectively, we can write
A=
r
σ j u j vTj .
j =1
k
For k < r the matrix j =1 σ j u j vTj is called the truncated SVD approximation to A. (See Sections 5.6,
17, and 45 for more information about the singular value decomposition.)
Facts:
Except for Facts 1 and 8 see [Bjo96, Chap. 1]. Also see Section 39.5.
= rank(A).
1. [Str88, Chap. 3]. A QR factorization preserves rank: rank(R) = rank( R)
1 = D R
2,
2. A QR factorization is not unique, but two factorizations Q 1 R1 and Q 2 R2 always satisfy R
where D is diagonal with Dii = ±1.
and the upper triangular Cholesky factor C for the normal equations matrix
3. The triangular factor R
T
, where D
is diagonal with D
ii = ±1.
A A always satisfy R = DC
4. If A has full rank and has the QR factorization A = Q R, then x0 can be found by back substitution
x = Q T b.
in the upper triangular system R
:,1:n
39-6
Handbook of Linear Algebra
5. If Q has not been saved, then we can use forward and back substitution to solve the seminormal
T R
x = AT b. For reasons of numerical stability, this must be followed by one step of
equations: R
iterative refinement; the complete process is called the corrected seminormal equations method.
6. Let rank(A) = p ≤ n ≤ m and A = Q R. The columns in Q :,1: p and Q :, p+1:m are orthonormal
bases of range(A) and ker(AT ), respectively.
7. Let A = U V T . Then p = rank(A) is equal to the number of strictly positive singular values:
σ1 ≥ · · · ≥ σ p > 0, σ p+1 = · · · = σmin{m,n} = 0. The columns in U:,1: p and U:, p+1:m are
orthonormal bases of range( A) and ker(AT ), respectively, and the columns in V:,1: p and V:, p+1:n are
orthonormal bases of range(AT ) and ker(A), respectively.
8. [GV96, Chap. 12]. The TLS solution can be computed as follows: compute the SVD of the coefficient
T . If the smallest singular
V
matrix A augmented with the right-hand side b, i.e., [ A , b ] = U
value σn+1 is simple, and if β = Vn+1,n+1 = 0, then
1:n,n+1 ,
xT = −β −1 V
:,n+1 V
T
E T = −σn+1 U
1:n,n+1 ,
:,n+1 .
rT = −σn+1 β U
Examples:
T with
V
1. For the problem from Example 1 in Section 39.1, we find [ A , b ] = U
⎡
⎤
⎡
4.515
⎥
= diag ⎢
⎣0.6198⎦ ,
0.0429
:,3
U
⎤
⎡
0.4248
⎢
⎥
= ⎣−0.8107⎦ ,
0.4029
⎤
0.3950
⎥
:,3 = ⎢
V
⎣ 0.2796⎦ .
−0.8751
Thus,
xT =
0.4513
,
0.3195
⎡
⎤
−0.0072 −0.0051
⎢
⎥
E T = ⎣ 0.0137
0.0097⎦ ,
−0.0068 −0.0048
⎡
⎤
−0.015939
⎢
⎥
rT = ⎣ 0.030421⎦ .
−0.015118
T
In Example 1 in section 39.1 we found x0 = 0.4500 0.3200 . The difference between x0 and xT
is small because the problem is almost consistent and A is well conditioned; see Section 39.6 The
elements in rT are about 80% of the elements in r0 given in Example 1 in Section 39.1.
39.5
Least Squares Algorithms
Definitions:
By least squares algorithms we mean algorithms for computing the least squares solution efficiently and
stably on a computer. The algorithms should take into account the size of the matrix and, if applicable,
also its structure.
Facts:
For real systems the following facts can be found in, e.g., [Bjo96], [Bjo04], and [LH95].
1. The algorithm which is least sensitive to the influence of rounding errors is based on the QR
factorization of A:
R.
(a) Compute the reduced QR factorization A = Q
T b (can be computed during the QR factorization algorithm
(b) Compute the vector β = Q
without forming Q explicitly).
−1 β via back substitution.
(c) Compute x = R
The use of this algorithm was first suggested in [Gol65].
39-7
Least Squares Solution of Linear Systems
2. If A is well conditioned, the normal equations can be used instead:
(a) Compute the normal equation system M = AT A and d = AT b.
(b) Compute the Cholesky factorization M = C T C and y = C −T d (the vector y can be computed
during the Cholesky algorithm).
(c) Compute x = C −1 y via back substitution.
3. If A or AT A is a Toeplitz matrix, use an algorithm that utilizes this structure to obtain the computational complexity O(mn).
4. If A is large and sparse, use a sparse QR factorization algorithm that avoids storing the matrix Q.
If solving a system with the same A but a different right-hand side, use the corrected seminormal
equations.
5. Alternatively, if A is large and sparse, it may be preferable to use the augmented system approach
because it may lead to less fill-in. Then a symmetric indefinite solver, such as the LDLT factorization,
must be used, cf. [GV96, Sec. 4.4].
6. If A is large and the matrix-vector multiplications with A and AT can be computed easily, then use
the conjugate gradient algorithm on the normal equations. Several implementations are available;
CGLS is the classical formulation; LSQR is more accurate for ill-conditioned matrices.
39.6
Sensitivity
Definitions:
For A ∈ Rm×n with p = rank(A) ≤ min(m, n) the condition number for the least squares problem is
given by κ(A) = σ1 /σ p , where σ1 ≥ σ2 ≥ · · · ≥ σ p > 0 are the nonzero singular values of A.
x0M denote the minimum-norm least squares solutions to the problems min b − Ax2
Let x0M and − Ax
2 , the latter being a perturbation of the former. Define the quantities
and min b
− A2 ,
δA = A
δb = b − b2 ,
η=
δA
δA
= κ(A)
.
σp
A2
Facts:
(See, e.g., [Bjo96, Sec. 1.4] or [LH95, Chap. 9].)
= rank(A).
1. If η < 1, then the rank is not changed by the perturbation rank( A)
x0M is bounded as
2. If η < 1, then the relative perturbation in x0M − x0M 2
κ(A)
≤
x0M 2
1−η
δA
δb + ηr0 2
+
A2
A2 x0M 2
+ η,
r0 = b − Ax0M .
If rank(A) = n, then the last term η is omitted. If the problem is consistent, i.e., r0 = 0, then the
relative error can be expected to grow linearly with κ. For r0 = 0 the contribution ηr0 2 and the
definition of η show that the relative error may grow as κ(A)2 .
3. The condition number for the normal equations matrix is κ(AT A) = κ(A)2 . Due to the finite
computer precision, information may be lost when the normal equations are formed; see, e.g.,
[Bjo96, Sec. 2.2] and Example 2 below.
4. Component-wise perturbation theory applies when component-wise perturbation bounds are available for the errors in A and b; if
− A| ≤ |A|
|A
and
|
b − b| ≤ |b| ,
where the absolute values and inequalities are interpreted componentwise, then
x − x| = |A† | (|b| + |A| |x|) + |(AT A)−1 | |E T | |r|
|
39-8
Handbook of Linear Algebra
and
x − x∞ ≤ |A† | (|A| |x| + |b|) + |(AT A)−1 | |AT | |r| .
∞
∞
See [Bjo96, Secs. 1.4.5–6] for further references about component-wise perturbation analysis and
a posteriori error estimation.
Examples:
1. We consider the problem from Example 1 in Section 39.1 and two perturbed versions of it,
⎡
1
⎢
A = ⎣1
1
⎤
⎡
1
⎥
2⎦ ,
3
⎤
0.75
⎢
⎥
b = ⎣1.13⎦ ,
1.39
⎡
0.8
=⎢
A
⎣ 0.95
1.1
⎤
1.1
⎥
2 ⎦,
2.95
⎡
⎤
0.79
⎢
⎥
b = ⎣1.23⎦ .
1.30
The matrix has full rank, so the least squares solution is unique (and equal to the minimum-norm
least squares solution). The vectors
0.4500
,
x0 =
0.3200
0.5967
x1 =
,
0.2550
0.8935
x2 =
0.1280
− Ax2 , and b
− Ax
2 , respectively.
are the minimizers of b − Ax2 , b
A2 /A2 = 0.4230 < 1.
The matrix has condition number κ(A) = 6.793, and η = κ(A)∗ A−
The relative errors and their upper bounds are
− b2
x1 − x0 2
b
= 0.291 ≤ κ(A)
= 0.423
x0 2
A2 x0 2
x2 − x0 2
1
= 0.875 ≤
x0 2
1−η
− b2 + ηb − Ax0 2
b
η + κ(A)
A2 x0 2
= 1.575 .
2. Consider the following matrix A and the corresponding normal equation matrix:
⎡
1
⎢
A = ⎣δ
0
⎤
1
⎥
0⎦ ,
δ
1+δ 2
A A=
1
T
1
.
1+δ 2
√
If |δ| ≤ (where is the machine precision), then the quantity 1 + δ 2 is represented by 1 on the
computer and, therefore, the computed AT A is singular. If we use Householder transformations
−1 −1
√ , so information about δ is preserved.
to compute the QR factorization, we get R =
0 δ 2
39.7
Up- and Downdating of QR Factorization
Definitions:
Given A ∈ Rm×n with m > n and its QR factorization, as well as a row vector aT witha ∈Rn , updating
A
= Q
R
of the factorization means computing the QR factorization of the augmented matrix T = A
a
from the QR factors of A. Similarly, downdating means computing the QR factors of A from those of A.
2
Up- and downdating algorithms require only O(mn) flops (compared to the O(mn ) flops of recomputing
the QR factors).
39-9
Least Squares Solution of Linear Systems
Facts:
The following facts can be found in Chapter 3 of [Bjo96].
without knowledge of Q by means a sequence of n Givens
1. The matrix R can be updated to R
transformations G 1 , . . . , G n .
then takes the form
2. The updating of Q to Q
= Q
Q
0
0
G T · · · G nT .
1 1
to R requires the first row Q
1,: of Q.
Let G
1, . . . , G
m−n be Givens rotations such
3. Downdating of R
T
that R = G m−n · · · G 1 ( R Q 1,: ) is upper triangular. Then R = R 2:n+1,2:n .
is not available, then its first row Q
1,: can be computed by the LINPACK/ Saunders algorithm
4. If Q
or via hyperbolic rotations. If A is available, the corrected seminormal equations provide a more
accurate algorithm.
5. Up- and downdating algorithms are also available for the cases where a column is appended to or
deleted from A.
M = M ± a aT is
6. Up- and downdating of the Cholesky factor under a rank-one modification analytically equivalent to updating R from the QR factorization. In the downdating case the matrix
M must be positive (semi)definite.
Examples:
1. Let
⎡
⎤
⎡
⎤⎡
1
0.5774 −0.7071 −0.4082 1.732
⎥
⎢
⎥⎢
2⎦ = Q R = ⎣0.5774
0
0.8165⎦ ⎣ 0
3
0.5774
0.7071 −0.4082
0
1
⎢
A = ⎣1
1
⎤
3.464
⎥
1.414⎦ .
0
If aT = ( 1 4 ), then
⎡
1
⎢
⎢
= ⎢1
A
⎣1
1
⎤
1
2⎥
⎥
R
⎥=Q
3⎦
4
⎡
with
⎤
2
5
⎢
⎥
0
2.236
⎢
⎥
=⎢
R
⎥ .
0 ⎦
⎣0
0
0
is computed by augmenting R with aT and applying two left Givens rotations
The updated factor R
G 1 and G 2 to row pairs (1,4) and (2,4), respectively:
R
aT
39.8
⎡
⎤
⎡
1.732 3.464
2
⎢
⎢
1.414⎥
⎢ 0
⎥ G 1 ⎢0
=⎢
⎥ −→ ⎢
0 ⎦
⎣ 0
⎣0
1
4
0
⎤
⎡
⎤
5
2
5
⎢
⎥
1.414⎥
⎥ G 2 ⎢0 2.236⎥
⎥ −→ ⎢
⎥ .
0 ⎦
0 ⎦
⎣0
1.732
0
0
Damped Least Squares
Definitions:
The damped least squares solution is the solution to the problem AT A + α I x = AT b, where α > 0
and A and b are real. Damped least squares is also known as ridge regression and Tikhonov (or Phillips)
regularization.
39-10
Handbook of Linear Algebra
Facts:
(See, e.g., [Han98] for further details.)
1. The two formulations
min{b −
Ax22
+
αx22 }
b
A
√
min −
x
0
αI and
2
are analytically equivalent.
2. The damping (controlled by the parameter α) reduces the variance of the solution, at the cost of
introducing bias.
3. If Cov(b) = I , then the covariance matrix for the damped least squares solution xα is
Cov(xα ) = AT A + α I
−1
AT A AT A + α I
−1
= V 2 ( 2 + α I )−2 V T ,
where and V are from the SVD of A. Hence,
Cov(xα )2 = max σi2 /(σi2 + α)2 ≤ (4α)−1 ,
i
while Cov(x0 )2 = (AT A)−1 2 = σn−2 , which can be much larger.
4. The expected value of xα is
E(xα ) = AT A + α I
−1
AT A x0 ,
which introduces a bias because E(xα ) = E(x0 ) when α > 0.
5. The damped least squares problem can take the more general form
min b − Ax22 + αBx22
⇔
(AT A + α B T B) x = AT b ,
where B · 2 defines a (semi)norm. The solution to this problem is unique when the nullspaces
of A and B intersect trivially.
6. [Han98, Chap. 7]. Some algorithms for computing α are the discrepancy principle, generalized
cross validation, and the L-curve criterion.
Examples:
1. Let δ = 10−5 and consider the matrix and vectors
⎡
1
⎢
A = ⎣1
1
⎤
1
⎥
1+δ ⎦,
1 + 2δ
⎡ ⎤
1
x0 =
,
1
b = A x0 ,
0
⎥
=b+⎢
b
⎣0⎦ .
δ
Obviously x0 is the least squares solution to Ax + r = b with r0 = 0. The minimizer of b − Ax2
is x0 = ( 0.5 1.5 )T , showing that for this problem the least squares solution is very sensitive to
perturbations (the condition number is κ(A) = 2.4·105 ). Using α = 10−8 , we obtain the damped
least squares solutions
0.999995
xα =
1.000005
and
0.995
xα =
.
1.005
xα and x0 to the perturbed
Comparing the damped and the undamped least squares solutions xα is a better approximation to the unperturbed solution x0
least squares problem, we see that than x0 .
39-11
Least Squares Solution of Linear Systems
39.9
Rank Revealing Decompositions
Definitions:
A rank revealing decomposition is a two-sided orthogonal decomposition of the form
R
T
A=U RV =U
VT,
0
where U and V are orthogonal, and R is upper triangular and reveals the (numerical) rank of A in the
size of its diagonal elements.
The numerical rank kτ of A, with respect to the threshold τ , is defined as
kτ = min rank(A + E )
subject to E 2 ≤ τ.
Facts:
(See, e.g., [Bjo96, Sec. 1.7.3–6], [Han98, Sec. 2.2], and [Ste98, Chap. 5].)
1. [GV96, p. 73] The numerical rank kτ is equal to the number of singular values greater than τ , i.e.,
σkτ > τ ≥ σkτ +1 .
2. The singular value decomposition is rank revealing with the middle matrix R = . The SVD is
difficult to update.
3. If A is exactly rank deficient with rank(A) = p, then there always exists a pivoted QR factorization
with R
of the form
A = QR
R11
R=
0
R12
,
0
R11 ∈ R p× p ,
= rank(R11 ) = p ,
rank( R)
V T , where
and a complete orthogonal decomposition of the form A = U R
R11
R=
0
0
,
0
R11 ∈ R p× p ,
= rank(R11 ) = p .
rank( R)
−1 T
U:,1: p .
The pseudoinverse of A is A† = V:,1: p R11
−1 T
Q :,1: p b. The
4. A basic solution can be computed from the pivoted QR factorization as x0B = R11
minimum-norm least squares solution is given in terms of the complete orthogonal decomposition
−1 T
U:,1: p b.
as x0M = V:,1: p R11
such
5. The rank revealing QR (RRQR) decomposition is a pivoted QR factorization A = Q R
that
σi /c i ≤ σi (R1:i,1:i ) ≤ σi ≤ R1:i,1:i 2 ≤ c i σi ,
i = 1, . . . , n,
where σi is the i th singular value of A,
ci =
i (n − i ) + min(i, n − i ) ,
and σi (R1:i,1:i ) denotes the smallest singular value of R1:i,1:i . The RRQR factorization can be used
to estimate the numerical rank kτ . The RRQR factorization is not unique.
V T , such that, for
6. The URV decomposition is a two-sided orthogonal decomposition A = U R
k = 1, . . . , n,
σi c̆ k ≤ σi (R1:i,1:i ) ≤ σi ,
i = 1, . . . , k
39-12
Handbook of Linear Algebra
and
σi ≤ σi −k (Rk+1:n,k+1:n ) ≤ σi /c̆ k ,
i = k + 1, . . . , n,
where
c̆ k = 1 −
Rk+1:n,k+1:n 22
σk (R1:i,1:i )2 − Rk+1:n,k+1:n 22
1/2
.
There is also a ULV decomposition with a lower triangular middle matrix; both can be used to
estimate the numerical rank of A.
7. The RRQR, URV, and ULV decompositions can be updated, at slightly more cost than the QR
factorization.
Examples:
1. The rank of A is revealed by the zero element in the (3,3) position of R:
⎡
1
⎢
⎢2
A=⎢
⎣3
4
⎤
2
3
4
5
⎡
3
4⎥
⎥
⎥=QR
5⎦
6
⎤
5.477 7.303 9.129
⎢
⎥
R=⎣ 0
0.816 1.633⎦ .
0
0
0
with
Here the QR factorization is rank revealing (U = Q and V = I ).
2. Pivoting must be used to ensure that a QR factorization is rank revealing. The “standard column
pivoting” often works well in connection with Householder transformations; here the pivot column
in each stage is chosen to maximize the norm of the leading column of the submatrix A(k)
k:m,k:n to be
reduced. Example:
⎡
4
⎢
A = ⎣2
0
2
1
0
⎤
2
⎥
2⎦,
1
⎡
A(1)
⎡
−4.4721 −2.2361
⎢
= H1 A = ⎣ 0
0
0
0
−4.4721
⎢
A(1) = ⎣ 0
0
−2.6833
0.8944
1
⎡
= A(2)
R
⎤
−2.2361
⎥
0 ⎦,
0
−4.4721
⎢
= H2 A(1) = ⎣ 0
0
⎤
−2.6833
⎥
0.8944 ⎦,
1
⎡
1
⎢
= ⎣0
0
−2.6833
−1.3416
0
⎤
0
0
1
0
⎥
1⎦,
0
⎤
−2.2361
⎥
0 ⎦.
0
3. The standard column pivoting strategy is not guaranteed to reveal the numerical rank; hence, the
development of the RRQR and URV decompositions.
Least Squares Solution of Linear Systems
39-13
References
[Bjo96] Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.
[Bjo04] Å. Björck. The calculation of linear least squares problems. Acta Numerica (2004):1–53, Cambridge
University Press, Cambridge, 2004.
[Gau95] C.F. Gauss. Theory of the Combination of Observations Least Subject to Errors. (Translated by G.
W. Stewart.) SIAM, Philadelphia, 1995.
[Gol65] G.H. Golub. Numerical methods for solving least squares problems. Numer. Math., 7:206–216,
1965.
[GV96] G.H. Golub and C.F. Van Loan. Matrix Computations, 3rd ed., Johns Hopkins University Press,
Baltimore, MD, 1996.
[Han98] P.C. Hansen. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion.
SIAM, Philadelphia, 1998.
[LH95] C.L. Lawson and R.J. Hanson. Solving Least Squares Problems. Classics in Applied Mathematics,
SIAM, Philadelphia, 1995.
[Ste98] G.W. Stewart. Matrix Algorithms Volume I: Basic Decompositions. SIAM, Philadelphia, 1998.
[Str88] G. Strang. Linear Algebra and Its Applications, 3rd ed., Saunders College Publishing, Fort Worth,
TX, 1988.