A Primer on the Interpretations of Quantum Mechanics, Part 1

The term “Quantum Mechanics” conjures not just all manner of science fictional strangeness, but also philosophical questions about the nature of reality. Indeed, while the theory itself is well-worked out and rigorous1, the interpretation of that theory has been one of the perennial questions in the philosophy of science for a century now. Yet there remains a lot of confusion over the intersection of quantum physics and philosophy, both among scientists and among the general public. It doesn’t help that popular accounts of Quantum Mechanics sometimes seem to be more intent on blowing the readers’ minds than on providing a logically coherent overview of the issues.

What I want to do here is to get into the issue of the “interpretations of Quantum Mechanics” – setting up what that even means, giving an overview of each major interpretation, and finally telling you what I think about the whole thing. To do that, though, we first need to spend some time talking about the theory of Quantum Mechanics itself. So in this article, I’m going to try to tell you what Quantum Mechanics is and how it works, and in part 2 I’ll get to the interpretations.

I should say at the start that there will be some stuff that might look like strange or daunting math here, but I promise you it is not. Of course, there is some strange and daunting math in quantum physics, but none of it will show up here – there won’t be any actual math in this beyond squares and square roots, and even that will be somewhat tangential to the real material here. The strange-looking stuff you’ll see if you scroll down is just a special notation used in Quantum Mechanics (which I’ll talk about in due course), and I encourage you to approach it without fear and see if it doesn’t make sense in context.

 

What is Quantum Mechanics?

OK, so first of all, what is Quantum Mechanics? Put briefly, it’s a theory of physics that superseded Newtonian (or “classical”) mechanics in the 20th century. Starting around 1900, various experiments started to show that at a sufficiently microscopic scale, the predictions of Newtonian mechanics did not hold up. People like Max Planck, Louis De Broglie, Albert Einstein, and Niels Bohr made important steps toward understanding these obversations, leading eventually to the development of Quantum Mechanics as a formal theory by Erwin Schrödinger, Werner Heisenberg, and others, in the 1920s. Quantum Mechanics was formulated as a replacement for Newtonian mechanics, one that should approximate Newtonian mechanics at macroscopic scales (since Newtonian mechanics works pretty well there), but get the predictions right at small scales as well.

I want to digress for just a moment to talk about how this relates to relativity. Because this issue of not working at small scales was not the only problem for Newtonian mechanics; it also turns out that Newtonian mechanics becomes increasingly inaccurate at high velocities. To address that problem, the Special Theory of Relativity was formulated. Special Relativity and Newtonian mechanics agree pretty well for objects at low velocities, but increasingly disagree as those velocities get close to the speed of light. But Special Relativity on its own doesn’t solve the small scale issue; it still gets things wrong when the objects involved are small enough. Similarly, Quantum Mechanics on its own doesn’t solve the high velocity issue; it still gets things wrong when the objects involved are travelling close to the speed of light. To get things right at both small scales and high velocities, a relativistic version of quantum physics was needed, and this was developed starting in the late 1920s, by Paul Dirac among others – it’s called Quantum Field Theory. So what we have is this:

Our fundamental theories of the universe today are Quantum Field Theories, which include both of those fixes to Newtonian mechanics. So, if Quantum Mechanics per se is not the theory we believe, why am I talking about it? Well, it turns out that all of the interesting issues that come up in QM get ported over pretty directly to QFT; the “interpretations” of QM can, with a few exceptions (which I’ll try to mention when we get to them), be translated into interpretations of QFT. But QFT is significantly more complex, so for the purposes of this kind of discussion, it’s easier to just deal with the non-relativistic version of Quantum Mechanics.

I’m not going to get into the history of Quantum Mechanics – the specific experiments that revealed problems with Newtonian physics, the various ad hoc solutions that were gradually refined and joined together. It’s all interesting stuff, but it’s a bit beside the point here. Instead, I’m going to first try to give a kind of schematic picture of what the theory of Quantum Mechanics looks like and point out a few ways in which it differs both from Newtonian mechanics and from our everyday intuition about the world. Then I’ll talk in a little bit more depth about two particular experimental examples of QM (two-slit interference and spin), because when I move on to the interpretations of QM next time, it will be useful to look at how each one views those examples.

So, in classical mechanics, the universe is thought of as containing some number of objects or particles. A total description of the universe (or of some part of it) at some particular moment in time would take the form of a list of those particles, each one with numerical values for its position, velocity, mass, charge, and whatever other properties it might have. Position and velocity are values that can change over time, while things like mass and charge are fixed properties of each particle. The Newtonian equations (or laws) of motion tell us how position and velocity change over time. So given a complete specification of the positions and velocities of all the particles at some particular time (as well as their masses, charges, and any other fixed properties), we can use the equations to calculate exactly what the positions and velocities of those particles will be at some later time. For that matter, we can also calculate exactly what they must have been at some earlier time.

In Quantum Mechanics, instead of a list of particles with definite values for their properties, we have something called a wave function, which tells us the probability that we will get this or that result if we measure some property of a particle. And, corresponding to the Newtonian equations of motion, we have Schrödinger’s equation, which tells us how that wave function changes with time. As it turns out, the way the wave function behaves, in accordance with Schrödinger’s equation, is much like the way a classical wave behaves, hence the name – the wave function of a particle moving through space behaves much the same way as a wave in the water, or a sound wave in the air. You can think of the height of each “ripple” in the wave as corresponding to the probability of measuring the particle at that location.2 But note that it’s not just the position of the particle that is encoded in the wave function; the wave function is a mathematical object that describes all “observables” of the particle – that is, anything about the particle that you could measure.

Right away, we have one of the big differences between Classical and Quantum Mechanics. Classical mechanics is completely deterministic. If you’re given the complete configuration of particles at one time, you can calculate exactly what will happen to those particles out to infinity. The only way probabilities ever enter the picture in classical physics is when you don’t have complete information. If you know with 100% certainty that, at a certain time, particle A is exactly at such and such a position, particle B is in such and such another position, and so on, then you can predict with 100% certainty exactly where those particles will be at any other time. If, on the other hand, you only know that there’s an X% chance that the particles are in such and such places, a Y% chance that they’re in such and such other places, and so on, then the only prediction you can make is that there’s an X% chance they’ll wind up in such and such a configuration, a Y% chance they’ll wind up in such and such another configuration, and so on. But that’s the only way chance ever comes into things here. Probabilities in classical physics express nothing other than the degree of one’s knowledge or ignorance about the starting conditions.

In Quantum Mechanics, on the other hand, probability is fundamental. You can know the exact wave function of a particle, but that still will not allow you to predict with 100% certainty what the result of a measurement will be. The best you can do, even given all the information you could possibly have, is to say that there’s an X% chance the particle is here, a Y% chance it’s there, and so on.

But wait, there’s more. In Quantum Mechanics, it turns out that some observables are “incompatible”. Let’s say you have a particle whose wave function is localized to a very small area of space – what that means is that if you were to measure the position of the particle, the likelihood of finding that it’s in that area is very high, and the likelihood of finding that it’s outside of that area is very low. It turns out that if the wave function of the particle is localized in this way in position, then it will be extremely spread out in terms of momentum (which, for non-massless particles, is proportional to velocity). And in fact, the more localized the wave function is in position – the closer you get to being able to say with certainty “the particle is here” – the more spread out it will be in momentum or velocity. This is Heisenberg’s Uncertainty Principle, which is usually quoted, somewhat inexactly, as saying that you can’t know both the position and the momentum of a particle at the same time. But note that it’s really a statement about how the wave function works.

Let’s get a little more precise about what the wave function is. To do this, I’m going to need to introduce some notation that I’m afraid may scare people away, so let me reiterate: it’s not nearly as scary as it looks, and there isn’t really any math here. The wave function is typically represented by the Greek letter ψ and we write the possible states of a particle in brackets like |this>. Imagine a particle that we know with 100% certainty to be located at some point A. We would write the wave function for the position of that particle as:

ψ = |A>

Now suppose I gave you this wave function:

ψ = a|A> + b|B> + c|C>

What this means is that, if we were to measure the position of the particle, there is some chance we’d find that it is located at point A, some chance it’s located at point B, and some chance it’s located at point C. The probabilities of each of those results are given by the numbers a, b, and c. Specifically, the probability of finding the particle at point A is a2, the probability of finding it at point B is b2, and the probability of finding it at point C is c2. So that’s how it works: the wave function is written as the sum of all the different possible states of the particle, |A>, |B>, |C>, etc., and the number next to each of those states is the square root of the probability of finding the particle in that state. When a particle is in a state like that, we say it is in a “superposition” of the states |A>, |B>, |C>, etc.

Of course, a particle will typically not be limited to a few discrete locations like that. A real particle could conceivably be at any position in three-dimensional space. So really, to write the position wave function for a particle, you’d need to write down an infinite number of terms. But that’s OK; mathematics can handle that by making it a “function” – that is, a mathematical formula that specifies the probability values (a, b, c, and so on), for any point in space. So if I tell you the wave function for a particle, you can pick any point in space X and the wave function will tell you what little “x” is, and hence what the probability, x2, of finding the particle at X is.

Now that we have this way of writing and thinking about the wave function, let’s jump back to the Schrödinger Equation for a moment. Remember, the Schrödinger Equation is the quantum equivalent of Newton’s equation of motion; it tells us how the wave function changes with time. If I tell you the wave function of all the particles in a system at some time t1, then you can plug them into the Schrödinger Equation and calculate what the wave function will be at some later time t2. And an important fact about the Schrödinger Equation is that it is linear. What does that mean? Well, suppose that the state of a system at t1 is ψ = |A> and you plug this into the Schrödinger Equation and find what the state of the system will be at time t2 – let’s call that later state ψ = |A’>. Now suppose, instead, that the system at time t1 is in a different state, ψ = |B> – and let’s call the state that this system evolves into at time t2 (according to the Schrödinger Equation) ψ = |B’>. That’s a long-winded way of saying:

|A> -> |A’>
|B> -> |B’>

The fact that the Schrödinger Equation is linear means that if this is the case, then:

a|A> + b|B> -> a|A’> + b|B’>

In other words, if a system starts off in a superposition of multiple states, then each “branch” of that superposition will evolve just as it would if the system were in that state on its own, and moreover the probability associated with each branch will remain constant. So a superposition of initial states evolves straightforwardly into a superposition of the corresponding later states. (Note that it didn’t have to be this way – you could imagine that there might be some kind of complicated interaction between the different components of the wave function). This will be very important later.

 

The Double-slit experiment

Let’s introduce one of the classic experiments in Quantum Mechanics, one that illustrates some of its weirdness and will prove useful to come back to later. This experiment has its origins in attempts to figure out the nature of light: is light a wave or a particle? (The answer, of course, from Quantum Mechanics, is “sort of both” – like anything else, a particle of light is represented in QM by a wave function that gives the probability of finding it at any particular place).

The setup is this: We have a coherent source of light (like a laser) pointed toward an opaque plate that has two thin, parallel slits through which light can pass. Beyond the plate, we have a screen that we can observe the light shining on. In fact, let’s make it a screen of photo-sensitive paper that changes color wherever light hits it, so we can run the experiment and have a record of exactly how much light struck the screen where.

Double slit experiment setup

If light behaved like a classical, Newtonian particle, most of the particles would hit the plate and be blocked, but some of them would go through one or the other of the slits, and would proceed to hit the screen, so we’d expect to get two bright spots, one in front of each slit. On the other hand, if light behaved like a classical wave, most of the wave would be blocked by the screen, but some would pass through each of the slits, and we’d expect to get an interference pattern. What’s an interference pattern? Well, imagine you have two waves (think of waves in water, if you like) emanating from the two slits. At some points along the screen, the high point of one wave will arrive at the same time as the high point of the other, and the low points will similarly arrive at the same time. Where this happens, the waves will add to each other, so the resulting peaks will be twice as high. This is called “constructive interference”. At other points, though, the high point of one wave will arrive at the same time as the low point of the other, and vice versa. At these points, the waves will cancel each other out; this is “destructive interference”. In the case of light, if the height of the peaks corresponds to the brightness of the light, then we’d expect to see a series of bright and dark stripes show up on the screen, corresponding to the places where constructive and destructive interference occur.

An experiment essentially like this was performed by Thomas Young in the early 1800s, and he did indeed see the interference pattern described above. This was taken to show conclusively that light is a wave (as opposed to the particulate theory of light proposed by Isaac Newton). But in the early 20th century, other evidence began to show that light does in fact come in discrete particles, and, to cut to the chase a bit, we know today that the photon is the particle that makes up light. So where does the interference pattern come from?

Let’s take our experimental setup and turn the intensity of the laser way down, to the point where we’re shooting just one photon out at a time. Again, most of the time, the photon will hit the plate and be blocked, and we’ll see nothing, but once in a while one will make it through and we’ll see a single spot of light appear on the screen where it hits. If we do this, we’d find that as those spots of light continue to appear on the screen, they would gradually build up the same interference pattern of bright and dark stripes that we saw before.

From a classical point of view, this would be very puzzling. If only one photon at a time is being emitted, it must go through either one slit or the other. But if something is only coming out of one slit at a time, where does the interference come from? There would seem to be nothing for a single photon, going through one slit, to interfere with.

But in Quantum Mechanics, this is just what we would expect. Each time we shoot a photon at the screen, the wave function of the photon travels outward toward the opaque plate like a wave. Remember, this wave function is telling us the probability of finding the photon at a given location if we were to somehow measure its position. The wave passes through the opaque plate only at the places where we have the two slits, and it then continues on toward the screen from those two points, creating an interference pattern just like with the classical wave. Again, there are places where the two parts of the wave add to each other and places where they cancel out – but now, interpreting the wave in terms of probability, the places where the waves add constructively are places where the probability for finding the photon is higher, and those where they add destructively, or cancel out, are places where the probability is lower.

What happens when the wave hits the screen? Well, when the wave hits the screen, that amounts to a measurement of the position of the photon. We can’t predict exactly where the photon will hit and make a dot on the screen, but it has a higher chance of doing so in places where the wave’s amplitude is greater – in other words, it has a higher chance of being found in the places where we expect the bright stripes to end up. And so, if we keep repeating this with more and more photons, eventually we’ll accumulate a lot of hits in the places with constructive interference and fewer hits in the places with destructive interference, thus creating the bright and dark stripes.

So, the quantum mechanical way of thinking of this is that each photon is, itself, more or less a wave of probability, which passes through both slits and interferes with itself before striking the screen and having to “choose” a single point to be located at.

 

Spin

Another way Quantum Mechanics differs from Classical is that in many cases, the properties of a particle are “quantized” (hence the name). The classic example of this is something called spin, and since spin is something that I’ll want to use in some later examples, let’s talk about it now.

Textbooks will tell you that spin refers to the “intrinsic angular momentum” of a particle. What does that mean? Well, just for a moment, picture a particle as a tiny ball and imagine it rotating, like the Earth. It’s rotating at some angular speed, and correspondingly has an angular velocity. And it’s rotating in some particular orientation – it’s rotating about an axis, just like the axis of the Earth’s rotation is a line running through the poles. And furthermore, it’s rotating in a particular direction – clockwise or counterclockwise, if we look at it from the “north pole”. If it’s rotating clockwise, we say that its angular momentum is positive, and if it’s rotating counterclockwise we say that its angular momentum is negative. Note that we can talk about the total angular momentum of an object – the angular momentum around the north-south pole axis – but we can also talk about the angular momentum around some other axis. Let’s pick an axis at forty-five degrees to the north-south pole line. The angular momentum around that axis is about half of the total angular momentum of the Earth. Or let’s pick an axis perpendicular to the north-south pole line – an axis running through the Earth from one side of the equator to the other. The Earth isn’t rotating around that axis at all, so the angular momentum in that direction is zero. All this is purely Classical, first year physics stuff. (Incidentally, if you don’t quite follow this, it’s OK; I want to explain what spin means, but once we come to the quantum examples, to understand the logic of things it’s not really necessary to know what spin means, but only how it works, which is something we’ll get to in due time).

Now, fundamental particles aren’t really balls; they’re point-like objects. And a lot of physicists will tell you that you mustn’t think of spin as meaning that a particle is literally rotating. But for my money, that’s actually not such a bad way to think of it. Just imagine that we’ve taken that spinning ball and shrunk it down until it’s infinitely small, but without ever stopping it spinning. It still has some intrinsic angular momentum, and that angular momentum is still pointed along some particular direction.

Classically, a ball or particle could have any angular momentum you like. It could be spinning very quickly or very slowly, or anywhere in between. But in Quantum Mechanics, it turns out that angular momentum can only come in discrete values. There is a physical constant called ħ, and angular momentum only comes in units of 1/2 ħ. A particle (or indeed, a system of particles) could have an angular momentum of 0, or 1/2 ħ, or ħ, or 3/2 ħ, or 530,375,173 ħ, but it can never have an angular momentum of 1/3 ħ, or of 530,375,173.2 ħ. In fact, even the angular momentum of the Earth has to be a multiple of 1/2 ħ – but since the value of ħ is so incredibly tiny compared to the angular momentum of macroscopic objects, we don’t notice the quantized nature of angular momentum on a macroscopic scale.

It gets better. Not only are only certain values of spin allowed, but it turns out that every particle of a given type always has the same total spin. All electrons everywhere have a total rotational angular momentum of 1/2 ħ. All photons everywhere have a total rotational angular momentum of ħ. The spin of a particle is thus an intrinsic property, just like its mass, or its charge. And, by the way, since every angular momentum we talk about in quantum physics has an ħ in it, we typically just drop the ħ and say that the electron has a spin of 1/2, and the photon has a spin of 1.

And now we get to one of the really strange parts. Consider an electron. It has a total spin of 1/2, and that spin could be oriented in any direction. Let’s pick an arbitrary direction and call it the x-axis, and measure the spin along that direction. There’s no reason to think the electron’s spin should be oriented in that direction, so classically you’d expect that when we measure the spin around the x-axis we’ll get something somewhere between -1/2 and 1/2 (1/2 if it just so happens to be pointing exactly along the x-axis and rotating clockwise, -1/2 if it’s pointing along the x-axis and rotating counterclockwise, and something in between if it’s pointing in some other direction). But spin only comes in units of ħ, so when we measure the spin along the x-axis, we’ll always measure it to be either 1/2 or -1/2. In other words, no matter what axis we choose to measure the spin along, we always find that the rotation of the electron is oriented directly along that axis, either pointing up (i.e. spinning clockwise) or pointing down (i.e. spinning counterclockwise).  This is deeply weird.

Let’s say we’ve measured the x-axis spin of an electron and the result we get is 1/2 (“spin up” along the x-axis, in the jargon of Quantum Mechanics). Now let’s measure the spin along the perpendicular y-axis. Classically, we’d expect to get a result of 0 – we know the electron’s rotation is pointed directly along the x-axis, so looking at it from a direction perpendicular to that axis, we should see no rotation at all. But in QM, a spin of 0 is not allowed for an electron (for a given particle, spin increments only in units of ħ), so we’ll either measure a spin of 1/2 or -1/2 along the y-direction. Even though we just measured the spin along the x-direction, and found that the electron’s rotation was oriented exactly along the x-axis, and we haven’t done anything to the electron between that measurement and this one, we now find that the rotation is oriented exactly along the y-axis, either up or down.

So it’s just a weird fact of Quantum Mechanics that angular momentum doesn’t add up in the same way it does in Classical Mechanics. The total angular momentum of an electron is always 1/2, but its angular momentum around any arbitrary axis is also always either +1/2 or -1/2.

We’re not quite done with the weirdness about spin, though. It turns out that the spin of a particle along the x-axis and the spin along the y-axis are also “incompatible observables”, just like position and momentum. Let’s pretend to measure some more spins to see what I mean. Suppose we measure the x-spin of an electron and find that it’s spin down (-1/2). If we measure the x-spin a second time (provided that nothing has happened to the electron in the meantime), we’ll again find that its x-spin is down, with 100% certainty. We can repeat the measurement any number of times we like, and we’ll always get the same answer: x-spin down.

Now let’s measure the y-spin. When we do this measurement, there’s a 50% chance that we’ll find y-spin up, and a 50% chance we’ll find y-spin down, which seems kind of reasonable. Let’s say we find the y-spin to be up. Again, we could repeat the y-spin measurement any number of times, and you’d always get the same result, y-spin up.

You might then be ready to sit back and say, “All right, the x-spin of this electron is oriented down and the y-spin is oriented up”. But, just for fun, let’s measure the x-spin again. And here’s the upshot: when we measure the x-spin again, after having measured the y-spin, we have a 50% chance of measuring “up” and a 50% chance of measuring “down”! It’s as if the fact that we measured the y-spin “erased” the electron’s memory of what its x-spin was. And of course, it works the other way around too – if we measure the x-spin, we “erase” the electron’s memory of its y-spin. X-spin and y-spin are incompatible, just like position and momentum – if you know the x-spin, you don’t know the y-spin, and vice versa.

Remember, the probabilities associated with all observables of a particle, including spin, are specified by that particle’s wave function. So, going back to the notation we introduced above, we can write the state of an electron in terms of its x-spin like this:

ψ = a|x-up> + b|x-down>

If an electron is in this state, then when we measure the x-spin, the probability that the result is “up”, or +1/2, is a2, and the probability that the spin is “down”, or -1/2, is b2. An electron that we’ve already measured to have x-spin up would have a = 1 and b = 0, or in other words just:

ψ = |x-up>

[spoiler title=’An aside on x- and y-spins’ style=’default’ collapse_link=’true’]As an aside, the states in which the electron has a definite value for the y-spin are particular superpositions of x-spin states. The |y-up> state is equal to 1/√ 2 |x-up> + 1/√ 2 |x-down> and the |y-down> state is equal to 1/√ 2 |x-up> – 1/√ 2 |x-down>. With a little algebra, you can see that the |x-up> and |x-down> states can similarly be written in terms of the |y-up> and |y-down> states. Notice that if you have an electron in either the |y-up> or |y-down> state and you measure the spin in the x-direction, you have exactly a 1/2, or 50%, chance of getting either x-up or x-down. In other words, an electron in a definite y-spin state is completely agnostic as to its x-spin, and vice versa.[/spoiler]

 

Entanglement

Let’s move on to one more counter-intuitive feature of Quantum Mechanics. The wave function tells us the probability of measuring a particle to be in a particular state. But it turns out that those probabilities can be correlated between particles. So far, I’ve been talking as if a single particle has its own individual wave function. And that’s fine in the case where the probabilities associated with measuring that particle are independent of the probabilities associated with any other particle. But that’s not always the case; we can’t always separate the wave function of one particle from another, so we often need to write it as the wave function of a system of particles. For example, imagine we have two electrons, which we’ll call electron number 1 and electron number 2. Let’s call the x-spin of electron 1 “x1” and the x-spin of electron 2 “x2”. It’s possible for the state of those two electrons to look like this:

ψ = a|x1-up> |x2-up> + b|x1-down>|x2-down>

What this means is that if we measure the spins of both electrons, there is an a2 chance that the result will be “up” for both, and a b2 chance that it will be “down” for both. But there is no chance of measuring one of them to be “up” and the other to be “down”. In other words, even though we don’t know the x-spin of either electron, we do know that they have the same x-spin.

 

The Measurement Problem

Now we’re coming to the crux of things, because there’s one major issue that I’ve so far swept under the rug. I’ve told you that the wave function describes the probability of finding a particle or system to be in some particular state when a measurement is performed. But I’ve also told you that the way the wave function changes over time is described by the Schrödinger Equation. The Schrödinger Equation is completely deterministic; if you plug in a wave function at some time t1, you can use it to calculate exactly what the wave function will be at some other time t2. There’s nothing probabilistic about it. But if that’s the case, where do the probabilities come from? What does it mean to say that the wave function describes the probabilities of getting certain results from a measurement? And if the wave function is a complete description of the universe, how could there possibly be any probability involved in measurement? Shouldn’t the state of the wave function after a measurement be completely predictable based on the state of the wave function before that measurement?

Let’s look at this problem a different way. Consider a device that measures the x-spin of an electron. Such a device can easily be created; all you need is a box with a properly oriented magnetic field inside; if the electron’s x-spin is “up” (+1/2), an electron passing through the box will drift in one direction, and if it’s “down” (-1/2), it will drift in the opposite direction. To make things perfectly straightforward, let’s imagine that there’s a big pointer on the front of the box, and the words “up” and “down” are printed in big letters. If the electron’s x-spin is +1/2, it drifts upward when it enters the box and pushes the pointer so that it points to the word “up”, and if its x-spin is -1/2, it drifts downward and pushes the pointer so that it points to the word “down”.

Although the pointer consists of many particles, we can represent its state with a wave function. Let’s use the symbol |pointer-up> to represent the state in which the pointer is pointing to the word “up”, and similarly |pointer-down> for the state in which it’s pointing to the word “down”. If an electron in the state ψ = |x-up> enters the box, the process that will occur is:

|x-up> -> |x-up>|pointer-up>

And similarly, if the electron is in the state ψ = |x-down>, this will happen:

|x-down> -> |x-down>|pointer-down>

All this is just putting in symbols what is obvious just from the definition of a measuring device: when an electron enters the box, the state of the pointer changes according to whether the electron’s x-spin is up or down.

Now suppose we have an electron in this spin-state:

ψ = a|x-up> + b|x-down>

This is supposed to mean that, if we measure the x-spin of the electron, there is an a2 chance we’ll get a result of “up” and a b2 chance we’ll get a result of “down”.

What happens when this electron enters the box? Remember what we said earlier about the Schrödinger Equation being linear. Since we know what happens to each of the two components of that state, we know what happens to the combined state as well:

a|x-up> + b|x-down> ->
a|x-up>|pointer-up> + b|x-down>|pointer-down>

In other words, according to the Schrödinger Equation, when an electron in a superposition of different x-spin states enters the x-spin measurement device, the measurement device also has to end up in a superposition of two different pointer states. And bear in mind that there is nothing particularly strange or unlikely about this situation. We can easily put an electron in such a state, and we can easily build a measurement device that behaves this way.

But the state we end up in (according to the Schrödinger Equation) is kind of puzzling. We performed a measurement, but did we get a result? When we look at things like pointers, we see them pointing in definite directions. We never see something like the state above; we never see pointers pointing in two directions at once.

Let’s take it one step further. We can presumably also describe the state of the human who is performing the experiment with a wave function. Brushing aside the great complexity of the human brain, we know that there must be different quantum states corresponding to the brain belonging to someone who looks at the measurement box and sees the pointer pointing to “up” and that of one who looks at it and sees the pointer pointing to “down”. Call those states |brain-sees-up> and |brain-sees-down>. It again follows from the linearity of the Schrödinger Equation that if this person allows the electron to enter the box and then looks at the pointer, then the state of the electron/box/brain system will be a superposition of the two results:

a|x-up>|pointer-up>|brain-sees-up> + b|x-down>|pointer-down>|brain-sees-down>

And this just seems nonsensical. We know that when we look at a pointer, we don’t see it pointing in two directions at once. It would appear that the Schrödinger Equation, by itself, would tell us that measurements never actually have results. Yet measurements are precisely the reason we think Quantum Mechanics is a good theory in the first place, and after all the whole point of the wave function is supposed to be to predict, probabilistically, the outcomes of measurements.

This is the “measurement problem”. The orthodox way of dealing with this problem is to introduce an additional postulate, beyond the Schrödinger Equation, for how the wave function changes with time. Called the “collapse postulate”, this rule states that when a measurement with various possible outcomes takes place, the wave function instantaneously changes into a “pure” state corresponding to one of those possible outcomes. Which state does it collapse into? That is taken to be probabilistic, with the probabilities given by the appropriate values of the wave function.

In other words, according to the orthodox formulation of Quantum Mechanics, after we perform the measurement discussed above, instead of ending up in the puzzling state discussed above (where somehow the experimenter sees the pointer pointing in two directions at once), we end up in just one of these two states:

|x-up>|pointer-up>|brain-sees-up>

or

|x-down>|pointer-down>|brain-sees-down>

And while we can’t predict with certainty which of those two states we’ll end up in, we can say that the probability of ending up in the first one is a2, and the probability of ending up in the second one is b2.

To repeat, then, orthodox Quantum Mechanics says that there are two different rules for how the wave function changes over time:

1. When a measurement is not being performed, it follows the Schrödinger Equation, which is linear and deterministic
2. When a measurement is performed, it instantaneously collapses into a state corresponding to a possible result of that measurement, in a process that is non-linear and probabilistic

This is fine as far as it goes; there’s no particular reason to think that there can’t be one rule that applies at one time and another rule that applies at other times. But you should be somewhat unsettled by the fundamental role that the word “measurement” is playing here. And the fact is that orthodox Quantum Mechanics doesn’t offer anything like a precise definition of what actually constitutes a measurement. Consider our example where we measure the x-spin of the electron. At what point do we draw the line and say “all right, now, at this exact instant, a measurement has taken place, and the wave function “collapses”? Is it when the electron first enters the box and starts to drift in one or the other direction? Is it when the pointer on the front of the box moves? Is it when the experimenter looks at the pointer?

In part 2, we’ll start looking at the various ways in which these questions have been answered, and the various proposals for how to solve the measurement problem – that is, the various “interpretations” of Quantum Mechanics. First up will be the “collapse interpretations” that start from where we left off here and try to grapple directly with the problem sketched above. After that, we’ll look at two very different approaches: the Everett-style interpretations, which claim that, counter-intuitive though it may be, “collapse” never actually occurs, and the Schrödinger Equation always holds true (the famous many-worlds interpretation falls under this tradition), and the Bohmian interpretation, which gets around the measurement problem by adding some further structure to the ontology and dynamics of the theory. I hope to get part 2 (which is where we’ll actually get to what I consider the interesting part) posted two weeks from now, though I make no promises.