Artur Ekert1,2 and Alastair Kay1,3
1 University of Oxford, Mathematical Institute, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK
2 National University of Singapore, Centre for Quantum Technologies, Science Drive 2, Singapore, 117543, Singapore
3 Royal Holloway, University of London Egham, Surrey, TW20 0EX, UK
The classical theory of computation usually does not refer to physics. Pioneers such as Turing, Church, Post, and Gödel managed to capture the correct classical theory by intuition alone and, as a result, it is often falsely assumed that its foundations are self‐evident and purely abstract. They are not!
Computers are physical objects and computation is a physical process. Consequently, when we improve our knowledge about physical reality, we may also gain new means of improving computation. From this perspective it should not be very surprising that the discovery of quantum mechanics has changed our understanding of the nature of computation.
We tend to view computation as an operation on abstract symbols. Any finite set of symbols is called an alphabet and any finite sequence of symbols from that alphabet is called a string. Here, without any loss of generality, we will use the binary alphabet . We shall denote the set of all possible binary strings of length as . Binary digits can be added, , and multiplied, , as
The addition is also known as the logical XOR (exclusive OR ) and the multiplication as the conjunction or the logical AND ( ). Given two binary strings, and , we can add them bit by bit, for example, and can be added as . Note that for any binary string , .01 We can also view a string of bits as a vector of binary components and define the inner product by the standard rule of multiplying corresponding components and summing the results, for example, .
From a mathematical perspective, a computer is an abstract machine that evaluates a function
that is, given bits of input it produces bits of output. Such a function is equivalent to functions, each with a one‐bit output, known as Boolean functions,
Thus, one might as well say that computers evaluate Boolean functions. There are deterministic Boolean functions acting on bits. We shall start our discussion with the simplest one. Consider the most general computing machine that performs a computation on one bit, that is, it maps to itself. The action of the machine may be represented by the diagram,
The function of this machine is such that if we prepare the input with the value ( or 1) and then measure the output, we obtain the value ( or 1) with probability . More generally, if we prepare the input with probability , then we obtain the output with probability . It is convenient to tabulate the transition probabilities and express the action of the machine in a matrix form02
The state of a physical bit, for example, the input or output, is described by a probability vector, and its evolution by a transition matrix. The transition matrix has nonnegative elements satisfying the standard probability conditions , that is, entries in each column add up to 1, which means that each column can be viewed as a probability vector. Such matrices are called stochastic. For example, here are five stochastic matrices: the first four describe all possible deterministic limits of a one bit computation and the fifth one describes a completely random switch.
The identity, negation, and random switch matrices are, in fact, doubly stochastic, meaning that both their rows and columns add to 1. In general, stochastic matrices may have different numbers of rows and columns. For example, probability vectors can be viewed as single‐column stochastic matrices. Any machine evaluating a function can be described by a stochastic matrix with entries , where the input and the output . In particular, if the evolution of the machine is deterministic, then the entries take only two values, 0 and 1, and the matrix can also be used to define the function , instead of, for example, a truth table. Let us also mention that any computation can be embedded into reversible computation. For any function taking bits to , there exists an invertible function taking bits to which evaluates as
Initially, the input string is a concatenation of two strings: the first bits represent the string and the remaining bits are set to represent an arbitrary string . After the function evaluation the output is a concatenation of two strings: the first bits still represent but the remaining bits are set to the value . If you run the computation again, the string will evolve to which is equal to the initial . Consequently, and without any loss of generality, we can focus on machines performing reversible computations, where all the matrices describing the evolution are square.
It seems obvious that any machine whose action depends on no other input or stored information and which performs a computation on a single bit is described by some stochastic matrix. In general, we may conjecture that any machine that performs a computation on a physical system with distinguishable states is described by some stochastic matrix. The entries of such matrices can be derived from the laws of physics governing the dynamics of the machines. We may not know their specific values but at least we know they exist. This is a very reasonable conjecture, so let us have a closer look at some of its consequences.
Given two independent machines described by stochastic matrices and , we can make them work together either in parallel or in sequence. The resulting, composed, machines are described by some new stochastic matrices that are denoted as and , respectively,
The entries in the new stochastic matrices can be calculated following a few simple rules, or axioms if you wish, of classical probability theory. They state that with any event one can assign a number between 0 and 1 such that if represents a definite event, then . Moreover, probabilities are added for mutually exclusive events and multiplied for independent events,
It is worth stressing that independence and mutual exclusivity are different. Two independent events can both occur in the same trial whereas two mutually exclusive events cannot. In our case, individual machines act independently from each other, be it in parallel or in sequence, and events such as two transitions from the same initial state to different final states cannot both occur at the same time, and are thus mutually exclusive.
When we bring together a system with states labeled by 03 and a system with states labeled by , we form a composite system with states labeled by pairs of labels . For example, if we bring together two bits, we obtain a system with four states that can be labeled as , , , and . The composite labels such as are often written simply as strings , and it should be clear from the context that it is a concatenation of two symbols and not a product of two numbers. In the two‐bit case we write the composite labels as binary strings , and 11.
Now, suppose you prepare the input of the first system with the value and the input of the second one with the value and let act on the first system and on the second one. Your chance of observing the output , that is, the first system in and the second in , is . This is because the actions of the two machines are independent, and thus the probability that transitions to in the first machine and to in the second machine will happen is the product of the two, that is, . The stochastic matrix that describes transitions in the composed system between and is written as and has matrix elements . The symbol stands for the tensor product of two matrices. Our definition of can be applied to any two matrices and regardless of their shape and properties. This includes any two vectors as they can be viewed as matrices with just one column. In particular, if the first system is described by the probability vector with components and the second one by the probability vector with components and if the two systems are independent, then the composed system is described by the tensor product vector with components . Here is an example of a tensor product of two vectors and and a tensor product of a matrix with entries and any matrix (the matrix has a characteristic block form)
It should be stressed, however, that not all probability vectors of composed systems and not all stochastic matrices operating on such systems can be written as tensor products. For example, consider two bits and a controlled‐NOT operation defined as follows: flip the bit value of the second bit if the first bit has value 1 and do nothing otherwise. This operation can correlate the two bits, that is, evolve a probability vector that is a tensor into a probability vector that is not,
Following some manipulation, you will see that neither the controlled‐NOT nor the correlated probability vector at the output admit a tensor product decomposition. As we shall soon see, a similar, but much more subtle, effect called entanglement will be responsible for many counterintuitive features of quantum theory.
If a machine is followed by another machine , the resulting machine is described by the matrix product (note the order in which we multiply the matrices). This follows from the axiom that asserts that the probability of mutually exclusive events adds up.
We may argue that any transition between input and output in the composite machine can happen in mutually exclusive ways, namely through intermediate states . The probability that the input evolves into the output via the intermediate states is given by . Reading from right to left, we see that first evolves into with the probability and subsequently evolves into with the probability . The probability for this particular transition is . If we vary , we obtain alternative paths connecting the input with the output . Thus, we must sum over all values . Consequently, the stochastic matrix of the new machine is the matrix product with entries . For example, Eq. 19.13 illustrates our discussion in the case of two machines operating on two bits.
The probability of the transition can be written as .
Here we have made a tacit assumption that the matrices and act on the same number of bits. If this is not the case, we can always make them act on the same number by introducing the identity operation (the operation that simply maps inputs directly to outputs) in parallel to the smaller machine. For example, if acts on bits, and acts on the first bits, then the stochastic matrix describing the overall evolution is given by
In summary, when we compose two independent machines in parallel, we take tensor products and when in sequence, we take matrix products. The results are new stochastic matrices. The tensor product is associative, , so we can extend machines to act on any number of bits. This way, we can construct stochastic matrices of complicated machines composed out of many elementary submachines and view them as computers made out of elementary gates. Does this approach describe all possible computations?
It seems obvious that all possible computations are described by stochastic matrices. Surprisingly, this is not the case – the physical dynamics can be more subtle. In order to see this, let us define two new machines. We will call the first one the square root of NOT, also written as , because when this particular machine is followed by another identical machine, the output is always the negation of the input. The flow of probabilities in such a machine can be expressed schematically as
The second machine will be called the square root of SWAP. The SWAP operation interchanges the bit values of two bits, for example, . The square root of SWAP ( ) operates on two bits in such a way that two consecutive applications result in the full SWAP. Note that the is the identity when restricted to inputs 00 and 11 and acts as the when restricted to inputs 01 and 10. Thus, once we find a stochastic matrix for the we will be able to construct a stochastic matrix for the .
Suppose the square root of NOT is indeed described by some stochastic matrix . The matrix product should give a stochastic matrix corresponding to the logical NOT. This leads to contradiction because
recalling that . There is no stochastic matrix such that gives the logical NOT. A similar line of arguments shows that there is no stochastic matrix for the square root of SWAP. Thus, the square root of NOT is logically impossible, and so is the square root of SWAP.
It may seem reasonable to argue that since there is no such operation in logic, the and the machines cannot exist. But they do exist! Some of them are as simple as half‐silvered mirrors.
A symmetric beam‐splitter is a half‐silvered mirror that reflects half the light that impinges upon it, while allowing the remaining half to pass through unaffected. It has two input ports and two output ports. We label the two input ports and the two output ports by “0” and “1” as shown below.
Let us aim a single photon04 at such a beam‐splitter using one of the input ports, for example, port “0.” What happens? One thing we know is that the photon doesn't split in two: we can place photodetectors wherever we like in the apparatus, fire in a photon, and verify that if any of the photodetectors registers a hit, none of the others do. In particular, if we place a photodetector behind the beam‐splitter in each of the two possible exit beams, the photon is detected with equal probability at either detector, no matter whether the photon was initially fired from input port “0” or “1.” You may conclude that the beam‐splitter is just a random switch. Moreover, it may seem obvious that at the very least, the photon is either in the transmitted beam “0” or in the reflected beam “1” during any one run of this experiment. However, that is not necessarily the case. Let us introduce a second beam‐splitter and let us place two normal mirrors so that both paths intersect at the second beam‐splitter.
If we assume that a beam‐splitter is a random switch, then simple matrix multiplication of stochastic matrices shows that a concatenation of two beam splitters is also a random switch,
This makes perfect sense, apart from the fact that if we set up such an experiment, it is not what happens! It turns out that in the arrangement shown above, that is, when the optical paths between the two beam splitters are the same, the photon always strikes detector 1 and never detector 0. Thus, a beam‐splitter acts as the square root of NOT gate.
Is there something wrong with our reasoning here? Why does probability theory fail to predict the outcome of this simple experiment? One thing that is wrong is the assumption that the processes that lead the photon from the initial state to the detector 0 are mutually exclusive. In reality, the photon must, in some sense, have traveled both routes at once! Another important issue is the status of probability theory. There is no reason why probability theory, or any other a priori mathematical construct, should make any meaningful statements about outcomes of physical experiments. For this we need a physical theory – knowledge that is created as the result of conjectures, experimentation, and refutations. Enter quantum mechanics!
In order to calculate probabilities that agree with experimental data, we must introduce the concept of probability amplitudes – complex numbers such that the quantities are interpreted as probabilities. Probability amplitudes are added for mutually exclusive events and multiplied for independent events. In particular the rule of additivity of probability amplitudes replaces the classical axiom of additivity in probability theory, and it is this simple rule that sets the quantum and the classical worlds apart. It states:
This is the essence of quantum mechanics – the rest is just a set of convenient mathematical tools developed for the purpose of the bookkeeping of probability amplitudes.
In order to see this rule in action, consider an event that can happen in two alternative ways with probability amplitudes and . It is convenient to write these two complex numbers in terms of their moduli and phase factors: and . The probability amplitude of this event is and the probability of the event is then given by
The very last term on the r.h.s. marks the departure from the classical theory of probability. The probability of any two mutually exclusive events is the sum of the probabilities of the individual events, , modified by what is called the interference term, . Depending on the relative phase , the interference term can be either negative (destructive interference) or positive (constructive interference), leading to either suppression or enhancement of the total probability . Note that the important quantity is the relative phase rather than the absolute values and . These phases can be very fragile and may fluctuate rapidly due to spurious interactions with the environment. In this case, the interference term may average to 0 and we recover the classical addition of probabilities. This phenomenon is known as decoherence. It is very conspicuous in physical systems made out of many interacting components and is chiefly responsible for our classical description of the world – without interference terms we may as well add probabilities instead of amplitudes. However, there are many beautiful experiments in which we can control the phases of the amplitudes and observe truly amazing quantum phenomena.
One of the simplest quantum devices, and also the simplest quantum computing device, allows control of quantum interference. Mach–Zehnder interferometer is shown here.
Between the beam splitters, we have inserted slivers of glass with a different thickness for each of the possible paths. The glass slows down propagating photons and introduces slight delays during their journey between the two beam splitters. The slivers of glass are usually referred to as “phase shifters” and their thicknesses, and , are measured in units of the photon's wavelength multiplied by . We have labeled the two input and the two output ports of the interferometer as 0 and 1. When we fire a single photon into the input port 0, it can end up in the detector 0 in two alternative ways, as shown below.
In each of the beam splitters, the incoming photon is reflected with a probability amplitude of and transmitted with a probability amplitude .05 The effect of the slivers of glass is to multiply the probability amplitude on the lower path (labeled as path 0) by and on the upper path (labeled as path 1) by . The lower path involves two transmissions and the upper path two reflections. The probability amplitude of the two consecutive reflections including the phase shift is and the probability amplitude of the two consecutive transmissions and the phase shift on the lower path is . By using the rule of additivity of probability amplitudes, we can calculate the probability that the photon ends up in the detector “0,”
The same approach allows us to calculate the probability amplitudes, , and the probabilities, , of any to transition ( ), and tabulate them in matrices,
where . The entries of the stochastic matrix are obtained from the corresponding probability amplitudes – we take the squared moduli of probability amplitudes, . However, here the transition matrix is the matrix of amplitudes , not the matrix of probabilities ! Once we operate on amplitudes, the rules of the game are identical to those of classical probabilities: we multiply amplitudes of independent events and add amplitudes of “mutually exclusive” events.
The matrices that describe transitions in quantum machines are not just any matrices with complex entries. The probabilistic interpretation of amplitudes requires that any matrix that describes an admissible physical operation is unitary, that is, it satisfies where , known as the “Kronecker delta,” is a symbol that is defined to be 0 for and 1 for . In matrix form, the unitarity condition reads
Recall that the adjoint or Hermitian conjugate of any matrix with complex entries is obtained by taking the complex conjugate of every element in the matrix and then interchanging rows and columns .
Probably the most striking observation when we inspect the transition probabilities in quantum interference experiments is that the qubit reacts only to the phase difference, . The phases and in the Mach–Zehnder interferometer can be set up independently from each other, and the two arms of the interferometer may be miles apart, and yet it is the difference between the two phases that determines which of the two detectors will eventually click. The inescapable conclusion is that somehow the photon must have experienced both of them! It is very counterintuitive, but this is what experiments show. Between the two beam splitters, the photon is in a truly quantum state that is referred to as a quantum superposition of the lower and the upper path. We have labeled these paths as 0 and 1, and thus the two binary values can coexist. Quantum computers can operate on such superpositions of the binary values, a property that sets them apart from their classical counterparts.
If we remove the phase shifters from the interferometer, which is equivalent to setting , the formula 19.19 shows that the two beam splitters affect the logical NOT. One can also set up experiments that demonstrate the existence of . Addition of probability amplitudes explains the behavior of , , and many other gates, and correctly predicts the probabilities of all the possible outputs no matter how we concatenate the gates. This knowledge was created as the result of conjectures, experimentation, and refutations. Genuine scientific knowledge cannot be certain, nor can it be justified a priori. Instead, it must be conjectured, and then tested by experiment. Hence, reassured by the physical experiments that corroborate this theory, logicians are now entitled to propose new logical operations and . Why? Because faithful physical models for them exist in nature!
The optical Mach–Zehnder interferometer is just one way of performing a quantum interference experiment – there are many others. Atoms, molecules, nuclear spins, and many other quantum objects can be prepared in two distinct states, internal or external, labeled as 0 and 1 and manipulated so that transition amplitudes between these states are the same as in a beam‐splitter or in a phase shifter. However, there is no need to learn these technologies to understand quantum interference. You may conveniently forget about any specific technology (hardware) and refer to any quantum object with two distinct states labeled 0 and 1 as a quantum bit or a qubit. The interference of a single qubit can then be visualized as
This diagram shows all possible transitions between the states 0 and 1, and their corresponding probability amplitudes; it has the same features as the diagram of the Mach–Zehnder interferometer. We can view the action of the interferometer as a sequence of three elementary operations called quantum logic gates: a beam‐splitter followed by a phase shift , followed by another beam‐splitter . Each quantum gate is described by its matrix of transition amplitudes
where . We usually draw this sequence of operations as a quantum circuit,
Quantum circuit diagrams are read from left to right. The horizontal line represents a quantum wire, which inertly carries a qubit from one quantum operation to another. The wire may describe translation in space, for example, atoms traveling through cavities, or translation in time, for example, a sequence of operations performed on a trapped ion. This is a sequential composition of gates and, as with the case of stochastic matrices, all we have to do is the matrix multiplication,
In one swoop, this takes care of the multiplication and addition of probability amplitudes corresponding to different interfering paths in the interferometer. The phase shift effectively controls the evolution and determines the output.
We should mention here that the phase matrix contains only the relative phase. This is because can be written as , and we have already seen that it is the relative phase that really matters.
We have already mentioned that a qubit undergoing quantum interference enters a peculiar quantum state, a superposition of 0 and 1, in which it simultaneously represents the two binary values. In the classical world, the state of a physical bit is described by a probability vector; a qubit, in its all possible superpositions, is described by a vector of probability amplitudes, known as a state vector, which evolves as
A sequence of operations in quantum interference evolves the state vector of the qubit as
where we have omitted an overall phase factor from the output state. Probability vectors can be constructed at any stage by squaring the moduli of the components of the evolving state vector. In particular, at the output, the two binary values 0 and 1 are registered with respective probabilities and .
In general, any isolated physical system with distinguishable states is described by a state vector with complex components and any machine whose action depends on no other input or stored information and which performs a computation on a physical system with distinguishable states is described by some matrix of probability amplitudes, that is, an unitary matrix.
Last but not least, quantum interference may be implemented in a number of different ways. For example, in the lore of quantum computation, a beam‐splitter is often substituted by the very popular Hadamard gate, ,
and single qubit interference is represented as
The transition probability amplitudes in the circuit are calculated by the matrix multiplication ,
This simple quantum process contains, in a nutshell, the essential ingredients of quantum computation. The sequence of Hadamard–phase shift–Hadamard will appear over and over again. It reflects a natural progression of quantum computation: first we prepare different computational paths, then we evaluate a function that effectively introduces phase shifts into different computational paths, then we bring the computational paths together at the output.
Although the addition of probability amplitudes is basically all we need to know to practice quantum mechanics, it is very convenient to have good tools and notation for the “bookkeeping” of probability amplitudes. A mathematical setting for the quantum formalism is a vector space with an inner product, often referred to as a Hilbert space.06 Here we are primarily interested in , the space of column vectors with complex entries. We shall follow the notation introduced by Paul Dirac in the early days of the quantum theory and write column vectors as
and the adjoint vector, , as ,
In this notation, the scalar product of two vectors, and , is written as
Much of linear algebra grew out of the need to generalize the basic geometry of vectors in two and three dimensions. The scalar product enables the definition of angles, lengths, and distances. Vectors for which are perpendicular – in they are called orthogonal. Any maximal set of pairwise orthogonal vectors forms an orthonormal basis and any vector can be expressed as a linear combination of the basis vectors. We have already used the standard orthonormal basis in , denoted as , where stands for a column vector with 1 in the th entry and 0s elsewhere. For example, the standard basis in is but there are infinitely many other orthonormal bases, for example, where .
Once we have defined an inner product, we can define the norm, or the length, of as . Using the norm, we can define the distance between any two vectors and as ; we say that is within a distance of if .
Quantum states of individual qubits are rather special. The complex components of the state vector
are constrained only by the normalization condition and can be conveniently parameterized as and , where and . Thus, we can map all of the single qubit states onto the surface of a sphere, that is, we can interpret as the polar angle and as the azimuthal angle. The sphere is called the Bloch sphere and the unit vector defined by and is called the Bloch vector. The Bloch vector in the Euclidean space should never be confused with the state vector in the Hilbert space.
Please note that the two basis states and are represented on the Bloch sphere as two antipodal Bloch vectors with and . The Bloch sphere at this stage may appear as an unnecessary complication but it will soon become a useful tool that helps to visualize relations between states of individual qubits and single qubit unitary operations.
We shall often refer to quantum systems with distinguishable states as quantum systems in . The vector space structure of quantum states means that if and are two possible quantum states, then the properly normalized superposition is also a valid quantum state. This is sometimes referred to as the superposition principle. From a mathematical point of view, it is a trivial remark, but we have already seen that its physical consequences are anything but trivial. It implies, for example, that a single photon can take two different paths in its passage through an interferometer, that an atom can be both in its ground and excited state, in general, that a qubit can represent both logical 1 and 0 at the same time.
Any linear operation on vectors is called an operator and any operator is completely determined by its action on the basis vectors. It can be written as
where is the matrix with 1 in the entry and 0s elsewhere. The result of acting on vector is , that is, vector multiplied by a complex number . In the Dirac notation, the matrix elements are written as , which follows directly from the expression 19.30 when we sandwich between and , and use , . Here we will usually refer to the standard basis in and make little distinction between operators and matrices, still one should remember that a matrix representation is basis dependent and the operator may be represented by different matrices in different bases.
We refer to properties of operators by referring to the properties of their matrices. For example, an operator is called unitary if its matrix is unitary, that is, , it is called Hermitian if , and it is called normal if . Both unitary and Hermitian operators are normal and all normal operators can be diagonalized by unitary matrices . More precisely, is normal if and only if there exists a unitary such that
where is the diagonal matrix, . The diagonal elements are known as the eigenvalues or the spectrum of and the column vectors of , which we can write as , are the corresponding eigenvectors of , that is and , . Thus, any normal operator admits the spectral decomposition, . Eigenvalues of Hermitian operators are real whereas for all unitary operators they are complex numbers of unit length: for some real .
Unitary operators are special – they preserve the scalar product. If and then and
This implies that unitary operations preserve the length of state vectors; the probabilities are conserved.
The set of all unitary matrices with matrix multiplication forms a group denoted as ; the unit element is the identity matrix and the inverse of is obtained by taking the Hermitian conjugate . The order in which we multiply matrices matters, usually , thus the group is non‐Abelian. We have already mentioned that we are allowed to ignore overall phase factors. We can avoid ambiguities of overall phase factors by restricting ourselves to matrices that belong to the special unitary group , that is, all unitary matrices with determinants equal to unity. Having said this, the convention of writing phase gates in the form rather than means that our matrices are often not in , but then we can always fix it by playing with global phase factors.
Any unitary matrix can be represented as the exponential of some Hermitian matrix, and a real coefficient, ,
This is analogous to writing complex numbers of unit moduli in the polar form as . The time evolution of a quantum state is a unitary process that is generated by a Hermitian operator called the Hamiltonian, . The Hamiltonian contains a complete specification of all interactions within the system under consideration. In an isolated system, the state vector changes smoothly in time according to the Schrödinger equation
For time‐independent Hamiltonians the formal solution reads
Here denotes Planck's constant, which has the value
However, theorists always choose to work with a system of units where .
Equation 19.33 acquires a deceptively simple form when squares to the identity, . In this case we obtain
Among the most popular single qubit operations are the PAULI gates, described by the Pauli matrices , , and ,
Two of the PAULI gates are already very familiar, the gate is a special phase gate with and the gate is the logical NOT gate, but we have written them again for completeness. The two gates, and , are often referred to as the bit flip and the phase flip, respectively. The Pauli matrices square to the identity
and they satisfy the relations
As well as being useful gates in their own right, the combination of the three Pauli matrices and the identity is useful in providing a decomposition of Hermitian matrices. Any matrix can be written as
where represents the vector of the Pauli matrices . This decomposition allows us to see some special properties of one‐qubit unitary rotations. Any element of can be written as
where is a unit vector with three real components , , and . There is a remarkable connection between unitary matrices that are in and three‐dimensional rotation matrices, which form a group denoted as . In our particular case, applying a unitary operation to a qubit described by a Bloch vector amounts to rotating by the angle about the axis defined by the unit vector .
The correspondence is established by the formula
where is the Bloch vector rotated by the angle about the axis defined by the unit vector . This gives a very simple geometrical solution of the Schrödinger equation with the Hamiltonian
The vector with components ) is called the Rabi vector and is often referred to as the Rabi frequency. The Bloch vector simply rotates around the Rabi vector with the Rabi frequency equal to the length of .
The state of the form contains all information about the qubit, but when we measure the bit value, we register either 0 or 1 with probabilities and , respectively. Although the measurement can, in principle, be explained in terms of unitary operations, here we will view it as a special, non‐unitary, quantum gate, defined as
where or . If we choose to measure the bit value of a qubit in state then the result of the measurement is 0 with probability and 1 with probability . The outcome of the measurement, , is written in the icon representing the measurement, and the output state of the measurement gate is . However, do not think that by measuring a given qubit over and over again you could accumulate enough data to estimate the magnitudes of the two probability amplitudes. This does not work because measurements modify quantum states. As you can see on the diagram above, if the measurement result is 0, the post‐measurement state of the qubit is no longer , but , and if the result is 1 the post‐measurement state is . The original state is irretrievably lost. This sudden change of state, with probability and with probability , due to a measurement is often called a “collapse” or a “reduction” of the state. The status of this “reduction” in the formulation of quantum mechanics is still debated.
A convenient mathematical formalism for quantum measurements performed on any quantum objects in is based on projection operators. The expression describes a projection onto . Indeed, the result of acting on any vector is , that is, vector multiplied by . The sum of projections on vectors from any orthonormal basis gives the identity operator, that is,
This is a useful expression, known as the decomposition of the identity. We can use it to expand any vector in any basis as
Any measurable physical property of a quantum system in , which takes values in some set of symbols labeled by , is represented by a set of projectors that satisfy and form the decomposition of the identity .
For example, the standard measurement on a qubit is described by the projectors and that form the decomposition of the identity . Given a qubit in state , the measurement gives outcome , , with probability and leaves the system in state . However, we can also measure other properties using, for example, the projectors and , . The two projectors define another measurement with two possible outcomes labeled by and . In the following, unless specified otherwise, all measurements are assumed to be performed in the standard basis. This is because any measurement can be reduced to the standard measurement by performing some prior unitary transformation. For example, and and , , thus measuring on is equivalent to measuring on ,
In some textbooks, quantum measurement is associated with Hermitian operators with the spectral decomposition ( ) and the different outcomes are described by the real values . As a result, it is often falsely claimed that the outcomes of quantum measurements must be labeled by real numbers. We emphasize, however, that we are really just labeling the measurement results, and hence we can associate any symbols we wish with the possible outcomes.
Given that quantum machines are described by their respective unitary matrices, and that the rules of addition and multiplication of amplitudes are the same as those of probabilities, we can construct more complex machines following the familiar composition rules: when two quantum machines, which are described by some unitary matrices and , act in parallel, their action is described by the tensor product , and when the action of is followed by , the resulting unitary matrix is the matrix product . You can check that both tensor and matrix products of unitary matrices give another unitary matrix, that is, both parallel and sequential compositions of quantum devices give another quantum device, as expected. In both the cases the order does matter, that is, in general and . Since the tensor product is associative, , we can extend quantum operations to any number of qubits.
If we bring two qubits together, we form a system with distinguishable states, which we label as 00, 01, 10, and 11. The circuits below show six unitary operations on the two qubits,
The first four are described, respectively, by unitary matrices that are tensor products , , , and ,
Please note that . The ‐fold tensor product of Hadamard gates , that is, applying to each qubit, is referred to as the ‐qubit Hadamard transform. The matrices of the two remaining gates, known as the square root of SWAP and controlled‐NOT, stand out as they do not admit a tensor product decomposition in terms of single‐qubit operations,
There is a whole family of square root of SWAP matrices. Our choice here, with the phase factor in the central sub‐matrix, is directly related to its most common experimental realization (the Heisenberg interaction). The controlled‐NOT (c‐NOT ) performs the bit flip, that is, logical NOT, on the second (target) qubit if the first (control) qubit represents logical 1 and does nothing if the control qubit represents 0.
Let us take a closer look at how the mathematical formalism introduced in the previous section can be applied to composite systems, that is, systems made out of several subsystems. For this we need to revisit the tensor product operation. We shall start with the simplest possible composite system, namely two qubits.
Two isolated qubits live in a tensor product space denoted as . The tensor product operation is a way of putting vector spaces together to form larger vector spaces. When we bring two qubits together, the first one in state and the second in state , we form a new state often written as , or simply , or even . The elements of are linear combinations of . The scalar product in is defined by the identity
and the space can be spanned by the four vectors of the standard basis , , , and , usually written as , , , and . Thus, if
then the state of a composite system can be written as
where we have used the fact that the tensor product is a linear operation. However, not all vectors in are of the form . To see this, consider the most general quantum state of two qubits
The complex amplitudes are constrained only by the normalization condition . This means that, in general, , where , which implies that not all quantum states of two qubits can be written in the form . Those that can are called separable and those that cannot are called entangled. For example, out of the two states
the first one is separable because it can be written as whereas the second one is entangled because it does not admit such a decomposition. Thus, a system of two qubits as a whole can be prepared in a quantum state that does not allow the attribution of separate state vectors to its parts. Four popular entangled states are the so‐called Bell states,
In order to entangle two (or more) qubits, we need gates that couple two qubits, for example, the square root of SWAP or the controlled‐NOT, for example, the circuit
evolves the four inputs, , , into the four Bell states. At present, the most common source of entangled qubits is a quantum optical process called “parametric down conversion.” A photon from a laser beam enters a beta‐borium‐borate crystal and gets absorbed, exciting an atom in the crystal in the process. The atom subsequently decays, emitting two photons whose polarizations are entangled.
We can extend the tensor product operation to any number of qubits. The ‐fold tensor product space is the dimensional complex vector space with the standard computational basis labeled by all binary strings of length . The elements of are all possible linear combinations of the vectors from the computational basis, in particular any quantum states of isolated qubits can be written as
In addition to the single‐qubit Pauli operations, we can also define bit flips and phase flips on selected qubits, and , where the binary string indicates the location of the flip, for example, , . The Hadamard gate squares to the identity and it can turn the action of the gate into and vice versa; the circuit is equivalent to the action of single and conversely . In general, the Hadamard transform can turn into and vice versa
The two circuit identities represent the two relations, and .
Labels, that is, binary strings , can be used to define unitary operations. For example, for any and any constant in we can express and as
and the Hadamard transform on qubits can be defined as
and implemented by applying the Hadamard gate to each of the qubits. This action is easily understood by using the fact that
Projectors for are tensor products of projectors pertaining to individual qubits, for example, . They form the decomposition of the identity and define the standard measurement on qubits. More precisely, given qubits in some quantum state we can measure them qubit by qubit obtaining binary string with the probability . For example, for we may register the binary string
This happens with the probability and the state after the measurement is . But, what if we choose to measure only some of the five qubits, say the first one?
In this case, the measurement is described by the two projectors
They satisfy . The two results of the measurement are 0 and 1 with the probabilities and , respectively. The corresponding state after the measurement is a properly normalized result of the projection and . Translating this into a simple algebra, we write the initial state as
where
and
Thus, the result of the measurement on the first qubit is 0 with the probability or 1 with the probability , and the post‐measurement states are and , respectively.
This shows us that there is more to quantum states of qubits than just being unit vectors in . When we have a composite system, we can ask questions about the relation of a quantum system of the whole to quantum states of its components. The tensor product structure of the underlying Hilbert space is surprisingly rich and there is much more in as compared to .
The notion of an entangled state will probably make you wonder about the usefulness of state vectors as a good description of quantum states. Indeed, the tensor product structure allows us to describe a separable state of two qubits by using a state vector for each of the qubits. However, when we have an entangled state, this is not the case – we cannot attribute state vectors to individual qubits. We shall not discuss this problem in detail here but let us mention that in order to describe one subsystem of a larger system, we are forced to generalize the concept of a quantum state to the density matrix. All the states that we have met so far have been described by a state vector and are known as pure states. These can be rewritten as a density matrix of . From this definition, we can see how unitary evolution must be represented, . However, the point of introducing density matrices is that they are more general than pure states, we can represent what are known as mixed states. These can be expressed in the form
where , and cannot be written as a single state . How does this relate to the state of a subsystem? Consider that we have a pure state over two subsystems and ,
but we would like to describe the state of just subsystem A. If we were to measure subsystem B, then we would get the result with probability , leaving subsystem A in the (unnormalized) state
where , and we can make similar statements for all possible measurement results on the subsystem. If we wish to describe subsystem , we do not measure , but we can treat it as if we measured it, but did not know the measurement result,that is, subsystem A is in the state with probability . The density matrix formalism allows us to describe this by
where the definitions of the states automatically encapsulate the probabilities. Mathematically, this procedure is known as taking the partial trace over subsystem ,
where the can be any complete basis over subsystem . The reduced density matrix is a useful description to be able to determine the outcomes of actions applied only to subsystem , and is also the first step in quantifying the amount of entanglement shared between two subsystems. Finally, it allows us to easily describe probabilistic operations. For example, if we started with a state , it could experience a unitary rotation with probability , or nothing could happen with probability . If we don't know whether it happens or not, we have to take into account the fact that it might have happened by describing our new state as
For pure states, which are sufficient to describe ideal computations, state vectors and density matrices are entirely interchangeable, and we tend to stick to state vectors for simplicity.
Many interesting quantum operations can be obtained by parallel and sequential compositions of quantum gates. For example, the circuit below, which you should read from left to right, affects the operation , which you should read from right to left.
It shows how to construct the controlled‐NOT with the square root of SWAP (denoted ) and two phase gates ( ) and ( ) that occur often enough to warrant separate symbols
The square root of NOT, all possible phase gates, and the square root of SWAP suffice to construct any unitary matrix acting on any number of qubits. These three operations do not have any classical analogs. Consequently, it should not be very surprising that most machines built out of these operations do not have any classical analogs and some of their properties are counterintuitive.
The square root of NOT, phase gates, and the square root of SWAP are not special, there are many sets of gates that can be combined to form all possible unitary operations. Such a set is said to be universal. Some gates are very popular because of their nice mathematical properties and some because of their simplicity of experimental implementation.
The gates are known as the CLIFFORD gates. The Clifford group contains all unitary operations that can be written as a products of tensor products of the Clifford gates and it plays an important role in the theory of quantum computation and quantum error correction. The Clifford gates are not universal, but it is known that CLIFFORD together with any other single qubit gate, not generated by the gates in CLIFFORD, form a universal set of gates. Typically, the standard discrete set of universal gates is CLIFFORD augmented by the phase gate . It can be used to approximate any unitary operation.
Given controlled‐NOT we can implement any controlled‐ , for some single‐qubit unitary transformation . The controlled‐ gate (c‐ ) applies the identity transformation to the target qubit when the control qubit represents logical 0 and applies the operation when the control qubit represents logical 1. Any unitary operation on a single qubit , up to an overall phase factor, can be written as
for some unitaries and . Thus, any controlled‐ operation can be constructed using controlled‐NOT gates as
and, subsequently, any controlled‐controlled‐ as
where is any unitary matrix satisfying . The choice leads to another useful gate, known as the controlled‐controlled‐NOT gate ( ‐NOT ), or the Toffoli gate. It flips the target only if the two control qubits are both set to 1 and does nothing otherwise. This gate appears frequently in circuits that evaluate Boolean functions and we shall discuss its action in more detail later on.
Unitary operations on one qubit are described by unitary matrices, on two qubits by matrices, on three qubits by matrices and, in general, on qubits by matrices. The state vector of qubits has complex components that evolve as , that is
Quantum computation is a carefully controlled quantum interference that involves not one, but many qubits. A sequence of quantum operations on qubits, , , …, can be visualized as the circuit
Each horizontal line represents one qubit. A complex quantum interference involving transitions between the basis states is described by the matrix product
However, each unitary operation may have an internal structure – it may be composed out of unitary operations acting in parallel on selected qubits, for example, as in the circuit below
We construct , , …, by taking tensor products of the elementary gates acting in parallel. For example, , ,and so on. Thus, the whole circuit is a matrix product of tensor products of elementary gates.
Needless to say, building large quantum circuits requires large supplies of preselected quantum gates. The total number of gates in the circuit is called the size of the circuit. The depth of the circuit is the number of time steps required to complete the computation, assuming that each gate operation takes one time step and that gates acting on distinct bits can operate simultaneously. This is the number of unitary layers made out of tensor products, that is, . The width of the circuit is the maximum number of gates that act in any one time step. How many different quantum gates do we need as our elementary building blocks and how many will be used in our constructions? This seemingly innocuous question is rather deep and leads to the studies of the universality of preselected components and the complexity of quantum circuits.
Any unitary operation on qubits, that is, any unitary matrix, can be constructed as a quantum circuit. Moreover, in order to accomplish this task, all we need is an ample supply of very few types of elementary gate. However, it should be stressed that no matter what our choice of elementary building blocks, constructing a unitary operation on qubits is not easy and the size of the circuit, , usually grows very rapidly with . In order to quantify this growth, we shall frequently use the asymptotic notation that suppresses multiplicative constants. Given a positive function , the symbol means bounded from above by for some constant (for sufficiently large ). For example, is . Another common symbol, , means bounded from below by for some constant , and means both and . A circuit is polynomially bounded if its size is for some constant . When we build quantum circuits, we care about the economy of resources, after all each gate we use may cost us money. Circuits with logarithmic , linear , or polynomial size are considered reasonable, whereas circuits with exponential size, , are viewed as rather expensive. We shall return to this point later on.
The circuits and quantum gates are more than just tools for constructing unitary operations. We want to give them some computational meaning, associate them with algorithms, and quantify the complexity of these algorithms by patterns and numbers of gates in the corresponding circuits. Classical computers evaluate Boolean functions,
Quantum computers embed function evaluation into reversible computation and evaluate them as a unitary operations
where , . The corresponding circuit diagram is (for ),
and the unitary matrix can be written as
with the summation over all and .
Any Boolean function can be expressed as a formula containing only binary addition and multiplication. For example, the elementary Boolean operations negation (NOT, ), disjunction (OR, ), and NAND ( ) can be expressed in terms of addition and multiplication as
where . Once we have NOT, AND, and OR, we can write any Boolean function as a disjunction of conjunctions. For example, if a Boolean function takes value 1 only for the string 0110, that is, when , then can be expressed as a conjunction,
where we have written instead of . If takes value 1 only for the strings 0110 and 1000 then can be expressed as a disjunction of the two conjunctions,
In general, any can be written as a disjunction of conjunctions, which is called the disjunctive normal form (DNF). The corresponding formula can be easily deduced from the truth table of – we find all inputs for which , then for each particular string, , such that , we construct a conjunction, , such that if and only if , and then we take the disjunction of all . We have also used COPY because each in the DNF expansion of requires its own copy of to act on.
Quantum function evaluation can be then implemented with quantum gates such as the controlled‐NOT
which takes care of the binary addition, , in a reversible manner and the controlled‐controlled‐NOT (Toffoli gate)
which takes care of the binary multiplication , or the logical AND, and effectively all the logical connectives we need for arithmetic.
The controlled‐NOT also takes care of COPY of binary digits,
for . One might suppose that this gate could also be used to copy superpositions such as so that
for any . This is not so! The unitarity of the c‐NOT requires that the gate turns superpositions in the control qubit into entanglement of the control and the target.
So far, our construction of a quantum circuit that computes tacitly assumes that we know the truth table of . Needless to say, this is not the case in real computation – we compute because we do not know the answer. There is a fundamental difference between constructing circuits using look‐up tables and using simple prescriptions called algorithms, which describe how to construct the function with ‐bits of input from a function with ‐bits of input. For the vast majority of Boolean functions , we do not know any better way to “compute” than to consult the look‐up table. In fact, Claude Shannon used a simple gate counting argument to show that almost any Boolean function has a circuit size that is lower bounded by , that is, of . We are interested in a tiny minority of Boolean function to which this bound does not apply. Their circuits have patterns that we can spot. Any algorithm can be represented by a family of circuits , where the circuit acts on all possible input instances of bits. Any useful algorithm should have such a family specified by an example circuit and a simple rule explaining how to construct the circuit from the circuit . These are called uniform families of circuits. An algorithm is said to be efficient if it has a uniform and polynomial‐size family of circuits.
What makes quantum evaluation of Boolean functions really interesting is its action on a superposition of different inputs . For example,
produces for all in a single run and one may hope that this is the origin of the power of quantum computation. Unfortunately, this is more complicated. We cannot learn all the values from the entangled state because any bit‐by‐bit measurement on the first qubits will yield one particular value and the final qubit will then be found with the value . In order to achieve novel results, different to classical computation, we usually sandwich the quantum function evaluation in between other operations, such as the Hadamard transform, and ask questions about some global properties of that depend on many values of , for example, periodicity. For example, the Mach–Zehnder interferometer allows us to determine an unknown relative phase . In particular, if is guaranteed to be either 0 or , we only require one photon to determine this, whereas, classically, two photons are required.
It turns out that, in this case, families of quantum circuits are more powerful than their classical counterparts. One such example is the factoring problem – given an ‐bit number , find a list of prime factors of . The smallest known uniform classical family that solves the problem is of size , where is a constant. In contrast, there exists a uniform family of quantum circuits of size that solve the factoring problem. Since the outcomes of measurements are probabilistic, there is some chance of failure of quantum algorithms, and we describe an acceptable probability of a failure by the parameter . Of course, if such a failure occurs, we can readily detect it because testing if two potential factors are equal to is easy.
In this chapter, we have explored how computation is a physical process, and the workings of a computer are governed by the laws of physics. This has caused us to replace the classical description of a process, using probabilities and stochastic matrices, with a new description using probability amplitudes and unitary matrices. Many classical mathematical techniques and conclusions translate to this new scenario, such as the composition of gates by tensor products and matrix multiplication, or that all possible operations can be decomposed in terms of a small number of building blocks, the universal gates. These new, quantum, gates subsume the operations of a classical computer, but go well beyond them, allowing the possibility of intrinsically quantum algorithms that are more powerful than their classical counterparts.
If a function takes value 1 only for the strings 0110 and 1000, then it can be written inDNF as
Rewrite this in CNF.
What are the possible solutions?
If a function, , (in CNF) has sets of three variable disjunctions, and variables (this is known as the three‐SAT problem), then how would you expect the method you just applied to scale?
Consider constructing a circuit of nodes. Each node accepts two input bits and outputs a single bit. How many different types of nodes are there?
We are free to choose the inputs to each node from any of the bits in the system, and we are free to choose the output of the computation from any of the nodes. We can do this because a single bit can be input to any number of nodes. For each node in the circuit, how many possible organizations are there? Hence, estimate the number of Boolean circuits of size with input bits. This is denoted by .
Estimate the number of functions computable with at most gates. Hint: Use Stirling's approximation. Compare this to the number of possible circuits.
Let be the operator that exchanges subsystem 1 with subsystem 2. We say that is symmetric if and antisymmetric if . Identify which elements of the tensor product basis are symmetric, antisymmetric, or none of both.
Symmetrize the vectors and , that is, find superpositions that are either symmetric or antisymmetric. Show that they form, together with and a basis of . Ensure that your result is normalized.
If for any real , show that
Show that for any real and for any such that
where is a real number known as the coupling constant. This coupling is known as the Heisenberg, or the exchange, interaction. Show that, in matrix form in the computational basis, the Hamiltonian can be written as
that is, , where is the state swap operator for any two vectors and . Calculate the evolution due to this Hamiltonian after a time . Hint: Use the solutions from the previous example.
Suppose the two qubits stop interacting after . Show that this implements the square root of swap gate on the two qubits.