Skip to main content

Section8.3Orthogonal Projection

Objectives
  1. Understand the orthogonal decomposition of a vector with respect to a subspace.
  2. Understand the relationship between orthogonal decomposition and orthogonal projection.
  3. Understand the relationship between orthogonal decomposition and the closest vector on / distance to a subspace.
  4. Learn the basic properties of orthogonal projections as linear transformations and as matrix transformations.
  5. Recipes: orthogonal projection onto a line, orthogonal decomposition by solving a system of equations, orthogonal projection via a complicated matrix product.
  6. Pictures: orthogonal decomposition, orthogonal projection.
  7. Vocabulary: orthogonal decomposition, orthogonal projection.

Let W be a subspace of Rn and let x be a vector in Rn. In this section, we will learn to compute the closest vector xW to x in W. The vector xW is called the orthogonal projection of x onto W. This is exactly what we will use to almost solve matrix equations, as discussed in the introduction to Chapter 8.

Subsection8.3.1Orthogonal Decomposition

We begin by fixing some notation.

Notation

Let W be a subspace of Rn and let x be a vector in Rn. We denote the closest vector to x on W by xW.

To say that xW is the closest vector to x on W means that the difference xxW is orthogonal to the vectors in W:

WxWxxxW

In other words, if xW=xxW, then we have x=xW+xW, where xW is in W and xW is in W. The first order of business is to prove that the closest vector always exists.

Proof
Definition

Let W be a subspace of Rn and let x be a vector in Rn. The expression

x=xW+xW

for xW in W and xW in W, is called the orthogonal decomposition of x with respect to W, and the closest vector xW is the orthogonal projection of x onto W.

Since xW is the closest vector on W to x, the distance from x to the subspace W is the length of the vector from xW to x, i.e., the length of xW. To restate:

Closest vector and distance

Let W be a subspace of Rn and let x be a vector in Rn.

  • The orthogonal projection xW is the closest vector to x in W.
  • The distance from x to W is CxWC.

Now we turn to the problem of computing xW and xW. Of course, since xW=xxW, really all we need is to compute xW. The following theorem gives a method for computing the orthogonal projection onto a column space. To compute the orthogonal projection onto a general subspace, usually it is best to rewrite the subspace as the column space of a matrix, as in this important note in Section 3.3.

Proof

Let x=xW+xW be the orthogonal decomposition with respect to W. By definition xW lies in W=Col(A) and so there is a vector c in Rn with Ac=xW. Choose any such vector c. We know that xxW=xAc lies in W, which is equal to Nul(AT) by this important note in Section 8.2. We thus have

0=AT(xAc)=ATxATAc

and so

ATAc=ATx.

This exactly means that ATAc=ATx is consistent. If c is any solution to ATAc=ATx then by reversing the above logic, we conclude that xW=Ac.

Example(Orthogonal projection onto a line)

Let L=Span{u} be a line in Rn and let x be a vector in Rn. By the theorem, to find xL we must solve the matrix equation uTuc=uTx, where we regard u as an n×1 matrix (the column space of this matrix is exactly L! ). But uTu=u·u and uTx=u·x, so c=(u·x)/(u·u) is a solution of uTuc=uTx, and hence xL=uc=(u·x)/(u·u)u.

LuxxL=u·xu·uuxL

To reiterate:

Recipe: Orthogonal projection onto a line

If L=Span{u} is a line, then

xL=u·xu·uuandxL=xxL

for any vector x.

When A is a matrix with more than one column, computing the orthogonal projection of x onto W=Col(A) means solving the matrix equation ATAc=ATx. In other words, we can compute the closest vector by solving a system of linear equations. To be explicit, we state the theorem as a recipe:

Recipe: Compute an orthogonal decomposition

Let W be a subspace of Rm. Here is a method to compute the orthogonal decomposition of a vector x with respect to W:

  1. Rewrite W as the column space of a matrix A. In other words, find a a spanning set for W, and let A be the matrix with those columns.
  2. Compute the matrix ATA and the vector ATx.
  3. Form the augmented matrix for the matrix equation ATAc=ATx in the unknown vector c, and row reduce.
  4. This equation is always consistent; choose one solution c. Then
    xW=AcxW=xxW.

In the context of the above recipe, if we start with a basis of W, then it turns out that the square matrix ATA is automatically invertible! (It is always the case that ATA is square and the equation ATAc=ATx is consistent, but ATA need not be invertible in general.)

Proof

We will show that Nul(ATA)={0}, which implies invertibility by the invertible matrix theorem in Section 6.1. Suppose that ATAc=0. Then ATAc=AT0, so 0W=Ac by the theorem. But 0W=0 (the orthogonal decomposition of the zero vector is just 0=0+0), so Ac=0, and therefore c is in Nul(A). Since the columns of A are linearly independent, we have c=0, so Nul(ATA)=0, as desired.

Let x be a vector in Rn and let c be a solution of ATAc=ATx. Then c=(ATA)1ATx, so xW=Ac=A(ATA)1ATx.

The corollary applies in particular to the case where we have a subspace W of Rm, and a basis v1,v2,...,vn for W. To apply the corollary, we take A to be the m×n matrix with columns v1,v2,...,vn.

Subsection8.3.2Orthogonal Projection

In this subsection, we change perspective and think of the orthogonal projection xW as a function of x. This function turns out to be a linear transformation with many nice properties, and is a good example of a linear transformation which is not originally defined as a matrix transformation.

We compute the standard matrix of the orthogonal projection in the same way as for any other transformation: by evaluating on the standard coordinate vectors. In this case, this means projecting the standard coordinate vectors onto the subspace.

In the previous example, we could have used the fact that

EC101D,C110DF

forms a basis for W, so that

T(x)=xW=AA(ATA)1ATBxforA=C110110D

by the corollary. In this case, we have already expressed T as a matrix transformation with matrix A(ATA)1AT. See this example.

Let W be a subspace of Rn with basis v1,v2,...,vm, and let A be the matrix with columns v1,v2,...,vm. Then the standard matrix for T(x)=xW is

A(ATA)1AT.

We can translate the above properties of orthogonal projections into properties of the associated standard matrix.

We emphasize that the properties of projection matrices would be very hard to prove in terms of matrices. By translating all of the statements into statements about linear transformations, they become much more transparent. For example, consider the projection matrix we found in this example. Just by looking at the matrix it is not at all obvious that when you square the matrix you get the same matrix back.