Introduction to symmetry

Extensive and intensive variables
You've probably heard of extensive and intensive variables from thermodynamics -- an "intensive variable" is a variable or property defined at every point in space in a thermodynamic system -- e.g. temperature, pressure, density, etc. An "extensive variable" is just a variable or property defined for an entire region -- e.g. volume, internal energy, number of moles, etc.

To be specific, what we call intensive variables are intensive in space, and what we call extensive variables are extensive over space. I.e. the "point" at which an intensive variable is defined is a point in space, or a position, and the "region" over which an extensive variable is defined is a region of space. On the other hand, volume is intensive over time -- it's defined at a single point in time, not across an entire duration. An example of a variable which is intensive in space and extensive along time would be "the number of particles transversing through a given point in space over a period of time".

An easy "test" we're told to use in order to determine if a variable is intensive or extensive is "take a chunk of that region for a homogenous thermodynamic system and see if the variable scales -- if it's intensive, it will remain the same, but if it's extensive, it will scale. Why does this test work? Must the extensive variable scale down at the same proportion as the scale-down in volume? Could the variable scale up for a scale-down in volume?

A more formal way of putting all of this would be -- an intensive variable is a function mapping from each element of a set (like a position in space), and an extensive variable is a (definite) integral of some intensive variable (note that a function of this intensive variable would also be an intensive variable), or generally some function of such an integral. This need not have anything to do with thermodynamics.

$$E(a,b)=g\left(\int_a^b i(x)dx\right)$$
When I first thought of this a couple of years ago, I got a bit puzzled over my definition -- an extensive function could be differentiable more than once over the same parameter, it often has a (countably) infinite number of multiple-derivatives -- by which I mean its derivative, double-derivative, triple-derivative and so on. Would these guys be intensive variables or what? And if the double-derivative, say, is an intensive variable, then how could its integral still be an intensive variable?

To make things a bit more concrete -- take the displacement between the position of a particle at two fixed points in time. Its repeated derivatives in time are velocity, acceleration, jerk and so on. If displacement is an extensive variable in time, then velocity would be an intensive variable in time. But given that acceleration is an intensive variable in time, velocity must be an extensive variable in time.

The solution to this is to distinguish between "position" and "displacement", between "velocity" and "change in velocity", "acceleration" and "change in acceleration" and so on. Then we see that the definite integral of acceleration is the change in velocity, not the velocity itself, and that the definite integral of velocity as defined at every point is displacement, or "change/difference in position", not position itself, etc. On the other hand, velocity itself might be an indefinite integral of acceleration where the arbitrary constant of integration is based on some boundary condition and may depend on some reference frame, etc.

This insight is important in our study of invariances and symmetries, because we'll see that one column of these quantities are invariant under certain important transformations, whereas another column is not.

Furthermore, note that if we write the displacement of a particle after some time T as $x(t)=\int_0^T v(t)dt$, then the displacement is an extensive variable in t, but an intensive variable in T. Pretty cool.

Introduction to invariance

In the diagram above, suppose someone standing at the bottom-left corner of the table measured the positions of the cyan and purple apples. He records the position of the cyan apple as (0.1, 0.9, 0.2), where the long end of the table (the one near to us, running from left to right) extended is the x-axis, the short end of the table (the one on the left, running from near to far) extended is the y-axis, and the leg of the table near us and to the left, extended, is the z-axis. In the same co-ordinate system, he records the position of the purple apple as (0.8, 0.9, 0.4), measured in metres.

On the other hand, someone standing on the bottom-right corner of the table would measure the positions as (-0.9, 0.9, 0.2) and (-0.2, 0.9, 0.4) respectively from her co-ordinate system.

What we've just seen is a transformation, or a co-ordinate transformation -- specifically, this transformation is called a "translation". And we see that the positions of objects do not remain invariant under this transformation.


Hopefully, you should have a good background in linear algebra and affine transformations to understand the kinds of transformations you can have, but I'll give a quick run-through of other possible co-ordinate transformations --
  1. You could "rotate" your reference frame: if you say that the observer stands facing the x-direction and the y-direction is a right-angle to his counter-clockwise and the z-direction a right-angle from each of these axes as determined by the right-hand rule, then the observer could face another direction to change these axes, thereby transforming what the observer views as the co-ordinates of every object in his reference frame.
  2. You could "skew" your reference frame: the axes do not need to be orthogonal to one another. 
  3. You could scale your reference frame -- the distances the observer measures are in multiples of his own metre-stick, and using a shorter metre-stick would increase the recorded co-ordinates (and decrease the inverse of the recorded co-ordinates -- something to take note of when learning tensors). (Interesting to note that the word "units" as in unit kilograms, etc. comes from "unit vectors", in that they have magnitude 1, so this is equivalent to choosing a different basis with the basis vectors scaled down.)
  4. You might be viewing it while being at some other point on a fourth dimension -- e.g. checking at what time a fly sits on the apple, while starting your clock at a different time. This would be a translation in time.
And a bunch of other things or combinations of these transformations, etc.

"But wait!" you say, "they might not agree on the positions on a superficial level, but they can measure each others' positions and subtract that from the measured position of the object, and then they'd agree!"

That's true. But then we're no longer talking about position -- we're talking about the difference in position between two objects, i.e. the displacement or its magnitude, the distance: such as the displacement and distance between S and the cyan apple, the displacement and distance between S' and the cyan apple, the displacement and distance between S and the purple apple, the displacement and distance between the cyan apple and the purple apple.

In general, all distances are preserved under an affine co-ordinate transformation -- if you count scaling, then you could say the ratio of distances are conserved. This is what we call an invariant. Sometimes, when only certain transformations -- such as translations and rotations -- are important, we can say that the distances themselves are invariant.

Think of displacements -- are they preserved under all affine transformations or just translation? Are distances actually preserved under shears? If not, what is?

Another invariant we can notice is the duration -- even though two observers who start their clocks at different times (i.e. you can get to one of their co-ordinate systems/reference frames through a time-translation from another) disagree on what time a certain event occurs, they agree on the duration between two events -- e.g. the observer who starts his clock first, agrees with the other observer on the duration between the events "other observer starting his clock" and "some event occurs", so he knows what the other observer measures for the time of the event.

Invariants are useful, because we'd like to write our physical laws in terms of them, so all observers can agree on them. We'll see that the idea of an invariant can, to a large extent, also define an entire physical theory within classes of physical theories, as we will see in relativity.

A symmetry is just the same thing as an invariance, except that the latter is usually used to precisely describe what has to remain what under a certain transformation for a theory to have some given symmetry.

Symmetry on the complex plane

Something that first confused me when I learned of complex numbers was the fact that the quantities i and -i are defined in apparently the same way -- their square is -1.

It is clear that this symmetry -- their squares being equal -- does not imply that the two quantities are equal. This is because i and -i satisfy an additive property between them: i + (-i) = 0, which does not exist between i and i itself, or between -i and -i.

But do they exist at all?

Well, now I know that mathematical structures are said to exist if they are consistent, and that two structures, i and -i satisfying a relation between them exist when you define multiplication on the complex plane in such and such way, and that we define multiplication in such a way because the resulting algebra becomes useful for a variety of applications, etc. etc. But let's just hold on to my confusion for a while and try to intuitively explain why this is so.

It's useful to recognise one of the applications (or "interpretations", if you wish) of complex numbers -- rotations are dilations by a complex number. Consider a vector in $\mathbb{R}^2$ -- we know that this vector can be scaled by any scalar in $\mathbb{R}$. We also know that scaling the vector by $k$ is equivalent to scaling it twice by $\sqrt k$. For instance you could get from $\vec v$ to $2\vec v$ by getting first to $\sqrt2 \vec v$ and then applying the same scaling.

Now what if $k$ were a negative number, say -1? What scaling do we apply twice to the vector $v$ to get to $-\vec v$?

Well, if you know some linear algebra, you could replace the scalar with a matrix (specifically, a matrix which is a scalar multiple of the identity), and say that the square root of this matrix is a rotation matrix. But if we wanted to keep using scalars, we could extend our "$\sqrt{k}$ thing" to negative numbers, so as to set up a duality between i and the counter-clockwise $\frac{\pi}2$ rotation matrix $\left[\begin{matrix}0 & -1 \\ 1 & 0\end{matrix}\right]$, and similarly a duality between -i and the clockwise rotation matrix, the negative of the counter-clockwise rotation matrix.

Try squaring the matrix given above -- you will get the negative of the identity.

Just to be clear, in linear algebra we don't consider rotation by $\frac{\pi}2$ radians to be a scaling by i, unit-norm complex numbers are simply the eigenvalues of rotation matrices. When you represent rotations as complex numbers, $\mathbb{R}^2$ turns into the complex plane -- this is an example of a duality, and one could say there's an isomorphism between the two spaces equipped with the operations of addition and real/scalar multiplication. But the point of this is to give ourselves an example of an application of complex numbers to understand the difference between i and -i, as you will soon see.

This way, the negatives of complex numbers are just rotations in the opposite direction -- specifically, i is a counter-clockwise rotation, whereas -i is a clockwise rotation. This is useful, because rotations feel more tangible to us than complex numbers do.

If you want to understand what we just did, or are pissed by being so hand-wavy and switching between the complex plane and $\mathbb{R}^2$ at will, then you might want to understand why exactly it is that mathematics finds so many applications in other disciplines. The pure mathematician sees axioms as the foundation for a mathematical theory, and that all these mathematical theories "exist" in an abstract sense, whereas physics is the study of the universe, the only mathematical structure that we precisely observe. The applied mathematician sees axioms as interfaces, sufficient to guarantee that all theorems in the theory hold, so if you ever need to model some physical object or something from a computer program or whatever, you find out if you can get all the relevant quantities to satisfy some axiomatic system already extensively studied by mathematicians, so that all the results from this mathematical theory apply to it.

The connection between rotation-dilation transformations and complex numbers can be understood as an example of this -- these complex numbers themselves are some abstract ideas that satisfy some relations between them, and these transformations also satisfy these relations, so you can model these transformations as complex numbers. There can be other concrete objects or phenomena that satisfy the same axioms, and complex numbers would find applications there, too. An example would be how vectors are used in kinematics, quantum mechanics, programming, and other fields in very different ways. Mathematicians study linear algebra because many actual objects satisfy the relationships that these abstract structures satisfy.

Real numbers, too, are such abstractions that we can relate to simply because of how widely they're used, e.g. to measure scalar quantities. Same with natural numbers, used for counting.

Try making an analog of complex multiplication for vectors in two dimensions -- you will see that the resulting product is not the dot product or cross product or anything. It can, of course, be represented as a transformation of either vector, which gives us another duality between vectors and transformations/matrices, just like the well-known duality between row vectors and column vectors for the dot product, which tells us we don't need to define a new product for this. Are these two dualities the same? Obviously not.

Anyway, the point is, this application helps us understand the difference between i and -i -- i is a counter-clockwise rotation by a right angle, while -i is a clockwise rotation by a right angle. The algebra between i and -i is exactly the same as that between clockwise and counter-clockwise rotations. i and -i exist separately for the same reason that clockwise and counter-clockwise rotations exist, even though there is a symmetry in between them.

The general point here is that symmetry does not imply equivalence of the the co-ordinate systems or the transformations themselves, which distinguish between each other with some other algebra (e.g. additive). Having left-right symmetry does not imply that left is the same as right, or that left and right do not exist, or whatever. Having translational symmetry does not mean that all points in space are the same. Quite on the contrary, it means that they are not the same, yet some quantity remains invariant upon a translation.


Invariance under resizing windows
Consider a notepad file with some text in it. To locate a word, you can use two parameters -- the line number and the position of the word within the line, and write them down with a decimal point in between. E.g. "25.46" refers to the 46th word on the 25th line. If you resize the window, though, this position changes -- for instance, it might change to "50.21". Alternatively, you may write the position as the position of the word in the entire text, e.g. word# "1246". This remains the same as long as the text remains the same.

If we want to have the text be seen on different monitor sizes, then we'd want to refer to words by the latter system. That's an invariant. Similarly, we would want to express our physical laws in an invariant fashion.

No comments:

Post a Comment