## Thou art OrthogonalThe power of linear algebra is invisualizing the vectors so that
we may exploit the physical geometry of the system to immediately get the answer.
Orthogonality () is a highlight of this principle.
So many important concepts (e.g., "closeness", "unrelatedness", etc.) are associated with
this abstract notion, making it extremely useful as well as beautiful.
A prototypical illustration of orthogonality is the ## LSA: Reach for the Unreachable SolutionLet's get the obvious question out of the way. What does that name LSA mean? In its full version, it stands for "Using the orthogonality of vectors to giveLeast Squared length of the error vector
in your Approximations of the solution".
Why do we even need to resort to approximations? Suppose you want to find a linear function f to spit out 0 on 0, 1.1 on 1.1 and 2 on 1.5,
tough luck; there is no such f, becaue you cannot spear three points with a line if they're not on it.
f
under input-output constraints (x_1,y_1)...(x_n,y_n). Any f(x) can be expressed as
B + S * x, a line defined by the input x, bias B, and slope S.
Thus, the bottom line is we want to find some values for B and S such that the following is satisfied.
First Leap: Realize that this system can be written in matrix multiplication
This is just A
Let's look at A, which is an
Now that we know we don't have an exact solution, no matter what ## Projection: Follow the ShadowOrdinarily, when we have two vectorsv and w, we want to somehow relate them.
A good way is to "project" one (say v) onto the other, and examine the "shadow" p of v on w.
Second Leap: v decomposes into the part parallel to (i.e., wp)
and the part orthogonal to .
we can be seen as the amount v "erred" in modeling w.
: Notice that Importante is the shortest when it makes a normal angle with w by simple geometry;
this fact lies at the heart of our optimization method.
The parallel part p is some multiple of w, so p = aw for some scalar a.
OK, so far so good, but what a is supposed to be?
Third Leap: Use the following lemma.
In other words, finding
Coming back to our original task, we wish to find a 2x1 vector such that
the error b is outside the column space of A, so there is no x in
the row space of A that will directly yield Ax = b. Instead,
we decompose b into p and e by finding the projection p.
(Aside: The error vector e has to be in the left null space by the Fundamental Theorem because
it is the orthogonal complement of the column space. The null space is {0} on the assumption
that the two columns of A are independent.) This p is in the column space, so
we will find an approximate x in the row space that will yield Ax = p.
## The SolutionWe can't solve Ax = b exactly. But we know the vector Ax closest to b comes from
the x that yields the error vector b - Ax orthogonal to A,
implying A'(b - Ax) = 0. This gives A'Ax = A'b, and thus x = (AA')^{-1}A'b
is the bias and slope we want for the function f.
The only condition for AA' to be invertible is that the two columns are independent, so we're safe.
When we expand A'A and A'b in A'Ax = A'b, we come to the conclusion that
the line defined by x = (B,S) minimizes the error when
x - p and e;
Ax - p, being in the column space, has to be orthogonal to e,
which is in the left null space. This gives us
, where we note that the length of Ax - p can
only be positive and therefore Ax - b is minimized by choosing Ax = p.
b onto A.
## DemonstrationDoes it work? Well, for the example in the beginning, we couldn't fit the three points (0,0), (1.1,1.1), and (1.5,2) with a straight line. This one-liner in matlab,B = -0.0608 and the slope S = 1.2624 for x = (B,S).
Here is the picture:
minimum "overall" vertical distance from the points and the line.
We can fit many more. With seven points (-5,10), (-3,3), (-1,-1), (3,0), (4,-1), (6,5), and (7,8), x = (3.5973,-0.1074):
h(x)=B + S1 * x + S2 * x^2 might
do much better in fitting these scattered points. Behold the power of linear algebra: since h is still a linear
combination of the unknowns B, S(1), and S(2), we can still do it exactly the same way!
1. We want to find x = (-1.8121, -0.7624, 0.3108)
with this new definition. It is clear from the picture that the function is
doing a superior job of minimizing the vertical error distances:
In fact, while at it, let's make it fully general. We fit a polynomial of power m-1 to the points.
When m=2, it is a straight line. When m=3, it is a parabola. The new definition of nxm A and mx1
1. We want to find ## ConclusionThere are more even more amazing closed form solutions for A when it has orthogonal columns (this can be achieved by the Gram-Schmidt process, which is a canonical example of iteratively finding the pieces of "dimensional skeleton" one by one, securing orthogonality). Perhaps there will be further tutorials on the stupendous magic of linear algebra and its power to visualize and exploit the geometry of vectors, but for now, LSA was a cute enough instance of the magic. |