Introduction to linear transformations

Linear algebra, and algebra in general (algebra is one of the trinity of pure mathematics -- algebra, analysis and geometry) concerns itself with mathematical objects and their transformations.

Linear algebra actually deals with a specific kind of active/passive transformations, called linear transformations, but let's remain general for now, as our intention is to introduce algebra in entirety and try to understand the motivation to study linear algebra in particular.

There are two broad categories of mathematical objects that we call transformations: active and passive transformations. An active transformation is basically a function that maps an element (called a "vector") to another -- it involves actually changing the mathematical object in concern. On the other hand, a passive transformation is a transformation of the co-ordinate basis so that the representation in the basis transforms in precisely the opposite way as the basis so that the object itself remains the same.

A couple of analogies help reinforce this classification:
  • Numerical bases − Consider the number "101" in base 10. If we wanted to convert this into base 2, we would write 1100101. This is a passive transformation -- the actual mathematical object (101) remains the same, while the representation changes. On the other hand, suppose we kept the representation "101" but instead talked about the number that was 101 in base 2 (i.e. 5). This is an active transformation, because the actual number has stayed the same -- it's just that our "basis vectors" (1, 10, 100...) have been transformed to a new basis (1, 2, 4...) and our mathematical object itself, which is always the linear sum of 1 times the third basis "vector" (100 or 4) plus 0 times the second basis vector (10 or 2) plus 1 times the first basis vector (1 or 1) has transformed under this transformation.
  • Definitions − Suppose I made the statement "Bananas are yellow". This statement is true. But now if I were to redefine the word "banana" to mean what we usually call (or to be more precise, called in our previous system of definitions) "apple", then the statement would be false (the property of truth is not invariant under the transformation). This was an active transformation. On the other hand, if we redefined "apples" to mean "bananas" and changed the form of the sentence to read "Apples are yellow", the statement remains the same, and it remains true. This is a passive transformation.

An example of a co-ordinate (passive) transformation would be $(x,y)\rightarrow(\sqrt{x^2+y^2},\arctan y/x)$, which converts from Cartesian to polar co-ordinates. Alternatively, one could create an active transformation where $r$ takes on the value of the former x-coordinate and $\theta$ takes on the value of the former y-coordinate. It would be clumsy and pointless in this case, but it could be done.

Both passive and active transformations find a variety of uses in physics -- passive transformations are of central importance in relativity, while active transformations are an important tool in quantum mechanics.

Linear algebra deals with a specific kind of transformations, called linear transformations. A linear transformation is a transformation that satisfies the property that $L(ax+by)=aL(x)+bL(y)$ where L is the linear transformation, a and b are objects called "scalars", and x and y are objects called "vectors". Another way of putting this is that a linear transformation commutes with every linear sum operator.

A linear sum operator, of course, is an operator that takes in some number of vectors, scales each by some scalar that depends on the linear sum operator and adds the scaled results. You will learn that this is equivalent to multiplying the matrix with these vectors as columns with a vector that represents the operator itself.

Then this linear transformation acts on elements from a set of vectors (called a "vector space"), over a field of scalars. Common choices for the vector space are $\mathbb R^n$, $\mathbb C^n$, etc. and common choices for the scalar field are $\mathbb R$, $\mathbb C$, etc. In these cases, the vectors can be represented as tuples of real or complex numbers, and (especially in the case of the reals) are often used to represent various physical quantities with a direction in Newtonian physics.

The formal definition of a vector space is given by the following axioms:
  1. There exists an operation called "addition" on the vector space that satisfies the following properties:
    1. associativity -- $(u+v)+w=u+(v+w)$
    2. commutativity -- $u+v=v+u$
    3. identity -- there exists a vector $0$ such that $v+0=v$
    4. inverse -- there exists a vector $-v$ for every v such that $v+(-v)=0$
  2. There exists an operation called scalar multiplication on the vector space that satisfies the following properties:
    1. compatibility with field multiplication -- $a(bv)=(ab)v$
    2. shared identity element with field multiplication -- $1v=v$ where 1 is also the identity of field multiplication
    3. distributivity over vector addition -- $a(u+v)=au+av$
    4. distributivity over scalar addition -- $(a+b)v=av+bv$
Therefore our vectors need not be anything of the sort you'd encounter in a high-school physics or math class -- they can be functions, SVG files, arrays in computer programming, whatever, and a lot of other mathematical objects (one such object you'll eventually encounter is the tensor).

One of the reasons linear algebra is taught early is to introduce how mathematics works, and axioms are at the very foundation of mathematics. To a pure mathematician, axiomatic systems represent entire universes -- a pure mathematician unknowingly adopts a Platonist metaphysics, where our universe, the domain of physics, is just one among all the other possible axiomatic systems, all within the domain of mathematics.

On a fundamental level, this is not a testable or even meaningful claim -- however, it's a useful picture to keep in mind while thinking about mathematics and about being precise when you hear people talking about "existence", etc.

An equally interesting picture, though, is that of the applied mathematician. The applied mathematician is concerned with the physical world as it is. He observes the universe and finds that certain rules of logic based on some starting points turn up all the time in the real world -- in physics, in computer science, in finance, and practically everything else. For example, kinematics and relativity, quantum mechanics, and computer programming all seem to make use of certain logical ideas -- certain mathematical structures -- in very different ways. This idea I'm talking about is a vector and its linear transformations. So the applied mathematician probes for some set of logical assumptions from which these conclusions can be logically derived, called axioms. In the case of linear algebra, the axioms are the eight listed above.

So to an applied mathematician, axioms are more like an "interface" between the mathematical theory and its various applications. From the axioms, one can derive all the "theorems" of the mathematical theory, all its features. Rather than re-discovering all these facts about the various physical phenomenon, if the physicist or engineer or computer scientist or whoever can show that one can make a precise correspondence from the things in his field to the objects in the mathematical theory (such as mapping the quantities of "velocity", "momentum", "position", etc. to "vectors", producing a sensible idea of addition and dot products, etc.), then all the results (theorems) involving these mathematical objects will also apply to the things in his field. This is why mathematics is often called the art of identifying different things with analogous logical structures.

For the record, this is also why mathematics -- or to be more accurate, applied mathematics, or the mathematics we study is so useful in describing things -- it's designed that way. "Things" produce the motivation for us to do math, and specifically the kind of math that is useful to describe these things. Think of some other mathematical theories you know of -- calculus, the elementary algebra of the real numbers, of the complex numbers, of the integers, of the natural numbers, of the rational numbers, etc. What physical phenomena do they describe?

The idea of an axiomatic basis -- whether interpreted in the pure-mathematician sense or in the applied-mathematician sense, for there is no real difference except in how humans think of it -- is central to modern mathematics. Richard Feynman called it the "Greek school" of mathematics and science, as opposed to the "Babylonian school", which does not isolate a set of axioms from all the statements of the theory. The Babylonian school tends to be how physics and the other sciences operate, but only under the assumption that the mathematicians will eventually bring rigor to their field (e.g. with mathematical physics).


A set of axioms is seldom unique. There are always multiple different possible sets of statements that one can choose from which all the other statements of the theory can be derived (e.g. in Euclidean geometry, the Pythagorean theorem can be an axiom, replacing the parallel postulate). It's much like the concept of a defining property -- e.g. the function "$\sin(\theta)" can be defined in terms of the unit circle, where its Taylor expansion is a theorem, or the latter can be considered a definition, with the unit circle property being derived from it as a theorem.

Another example would be the number e. One definition would be the real number approached by the limit of $(1+\frac1n)^n$ as $n\to\infty$, while the other would be as the value of $\exp(1)$ where $\exp(x)$ is the function such that $\frac{d}{dx}\exp(x)=\exp(x)$ and $\exp(0)=1$. Both are defining properties, and can be derived from one another.

Coming back to the topic of linear algebra, the axioms can also be thought of as saying: a linear transformation is a transformation such that if the transformation is applied to all points on the plane, all lines (e.g. gridlines) remain lines (they don't curve) and the origin does not move (think about why these two are equivalent). The former is equivalent to stating that the gridlines (any set of evenly-spaced parallel lines) not only remain lines, but also remain evenly spaced and parallel, otherwise some other line would have to curve (try it out).

In the case that the origin does move, the transformation is called an affine transformation, which is the combination of a linear transformation and a translation, and is the generalisation of $y=mx+c$ to vectors and their transformations.

Something of interest to note here is the different ways to generalise the same thing to a more general domain -- if one wants to generalise a function or some other mathematical object $f(x)$ from a domain $x\in X$ to some $F(y)$ for $y\in Y$ where $X\subset Y$, then we do it based on some property satisfied by $f(x)$, where we decide on $F(y)$ so that it also satisfies this property.

However, we can also make the generalisation on basis of some other property -- this is exactly what's happening here, we can say that an affine transformation is a generalisation of a linear relationship among the real numbers to vectors, or that a scaling plus translation is (the cases where the first has an effect similar to the latter will be studied later when we look at eignvalues and eigenvectors). In the former case, the generalisation is based on the standard properties of a linear transformation also satisfied by $y=mx+c$, whereas in the latter case, the generalisation is based on simply scaling by a real number.

This allows one to easily determine visually if a transformation is linear. For example, it becomes clear that the Cartesian-to-Polar thing above was not a linear transformation, because the axis that is mapped to the theta-axis becomes curved.

Here's a random thought about linearity being nice. Suppose you have a relationship between $X$ and $Y$. Now if you jiggle $X$ around a bit, $Y$ jiggles a bit too. The average value of $Y$ during this jiggling period corresponds exactly on the line to the average value of $X$ during the jiggling.

On the other hand, if you have a non-linear correlation -- say with a peak in the middle -- this is no longer true. The average x-co-ordinate might give you the peak, but the average value of the y-co-ordinate is certainly not the maximum value of it. You might wonder if we can approximate non-linear things with linear things, such as by zooming close into a curve -- indeed, this is the point of calculus.

  1. Prove the relation between the two different axiomatic descriptions of a linear transformation described above.
  2. Hence or otherwise, prove that the following are all linear transformations:
    1. Rotation around the origin
    2. Scaling (along any line passing through the origin, the x- or y- axis, the line y = x, whatever)
    3. Shearing
    4. A combination of two (and by induction, $n$) linear transformations
  3. Prove that for all linear transformations $A,B$ there exists a unique linear transformation $C$ such that $Cv=ABv$ for all $v\in V$. We then say that $C=AB$, and can define transformation composition in this way for an arbitrarily number of compositions by induction.
  4. Is $(AB)C=A(BC)$ for all linear transformations $A,B,C$? Prove your answer.
  5. Is $AB=BA$ for all linear transformations $A,B$? Prove your answer.

Notes about associativity and matrix multiplication (added on 2018.10.31, related to 1103-004)

If you didn't try question 3 above -- or if you didn't try it before you tried question 4 -- go try it, go think about it. It's important.

You'll often see "proofs" of the associativity of matrix multiplication, where they write out the matrix in its full, unglorifiying form -- perhaps as a row of column vectors or a column of row vectors -- and they work out both products $(AB)C$ and $A(BC)$, then say "hey, look, they're the same!"

That's a nonsensical proof. It escapes every single insight behind matrix multiplication and why matrix multiplication is important, and pretends that the rules for matrix multiplication were simply "given to us" by some god-emperor who wrote some crappy textbook. That's not useful, and that gives you absolutely no insight as to why matrix products are defined the way they are.

First of all, it's important to recognise that there really are two separate questions: Does $A(Bv)=(AB)v$, and does $A(BC)=(AB)C$. From these two you can really figure out (by repetition/induction) all other bracket combinations for whatever number of matrices.

Let's consider the matrix-vector product first. And let's say we haven't yet defined matrix multiplication.

So how on earth could we talk about multiplying $A$ and $B$ first, if we haven't yet defined multiplication on matrices? We can talk about $A(Bv)$ (because nowhere here are you multiplying two matrices), but not $(AB)v$. The idea is that we make this the definition of matrix multiplication -- we say that matrix multiplication is the composition of the transformations associated with the matrices. But for this to still be a matrix, this requires that the composition of the two matrices always be the same matrix.

I.e. if we know that there exists some $C$, such that $A(Bv)=Cv$ for all $v$.

How do we go about proving this? Well, you might have a picture in your head of linear transformations/matrices transforming the entire space/entire vector field, rather than an individual vector, i.e. linear transformations transform all vectors in the "same way" in some sense (more formally, we're re-stating the definition of a linear transformation). And the statement above just means that the composition of two linear transformations is a linear transformation.

We can prove this easily (do it for yourself), and this implies our statement earlier, because we know all linear transformations are represented by a unique matrix.


In other words -- the images of the basis vectors under any transformation are independent (i.e. we need all of them to define the transformation) and also uniquely define the transformation (i.e. there is only one transformation that transforms all the basis vectors in this way), by definition of linearity, dimensionality and basis vectors. So we can determine $C$ from this independence, i.e. ensuring that $A(Bu)=Cu$ for all basis vectors $u$, and since any vector can be written as a linear combination of these basis vectors, i.e. the basis vector images determine the transformation, we can show $A(Bv)=Cv$ for all vectors $v$.

We move on to the second kind of associativity, $(AB)C=A(BC)$.

We've really just defined everything about matrices, including matrix multiplication, in terms of vectors so far, and all the intuition in our head is in terms of vectors and vector spaces (not the matrices themselves), so the best way to understand this identity, which contains only matrices, is to recognise both sides as linear transformations, i.e. explicitly write out the vector being operated on:

$$(AB)Cv=A((BC)v)$$
Recall our definition of matrix multiplication, $(AB)v:=A(Bv)$. We can apply this on both sides, rewriting the above as:

$$A(B(Cv))=A(B(Cv))$$
Which is obviously true.

No comments:

Post a Comment