%
%%

�
�
�

,
,
,,

�
��

��

e
ee
@
@@

l
l
l

Q
QQ

HHPPPXXX hhhh (((( ��� IFT
Instituto de F́ısica Teórica

Universidade Estadual Paulista

MASTER DISSERTATION IFT–D.008/2014

A Study of the Dynamics of Quantum

Correlations

Patrice Audibert Camati

Advisor

Dr. Gastão Inácio Krein

Feb/2014


i

Acknowledgements

I, firstly, would like to thank the National Council for Scientific and Technological

Development (CNPq) for having granted me with a scholarship along the past two

years. I thank my advisor for have accepted me as his student and for have allowed

me to pursue my studies in a field which is not one from his expertise. I also must

thank some colleagues of mine who provided me with insightful discussions about

physics as well as taught me a bit of their respective fields. They are: George De

Conto Santos, Guilherme Franzmann, Henrique dos Santos Flores, Lucas Lolli Savi

and Marcio Woitek Junior.

“The scientist does not study nature because it is useful; he studies it

because he delights in it, and he delights in it because it is beautiful. If

nature were not beautiful, it would not be worth knowing, and if nature

were not worth knowing, life would not be worth living.”

- HENRI POINCARÉ

“The book of nature is written in the language of mathematics.”

- GALILEU GALILEI

“If people do not believe that mathematics is simple, it is only be-

cause they do not realize how complicated life is.”

- JOHN VON NEUMANN

“Science is much more than a body of knowledge. It is a way of

thinking.”

-CARL SAGAN


ii

Abstract

In this work, we present an introduction to the quantum information theory.

Before dealing with quantum information we develop some necessary tools, the clas-

sical information theory and some important concepts in quantum dynamics. We

define measures of quantum correlations, concurrence and quantum discord, and

study their dynamical evolution when the coupling between the system and envi-

ronment is considered random. We conclude that a random environment imposes

permanent loss of correlations to the system.

Key words: Entanglement; quantum discord; random coupling constant.

Areas: Quantum information theory, quantum correlations.


iii

Resumo

Neste trabalho, apresentamos uma introdução à teoria da informação quântica.

Antes de lidarmos com a informação quântica propriamente dita, desenvolvemos al-

guns assuntos preliminares, a teoria clássica da informação e conceitos relevantes em

dinâmica de sistemas quânticos. Depois, definimos medidas de correlações quânticas,

concorrência e discórdia quântica, e estudamos sua evolução dinâmica quando o

acoplamento do sistema com o ambiente é modelado como aleatório. Conclúımos

que um ambiente aleatório impõe uma perda permanente nas correlações do sistema.

Palavras-chave: Emaranhamento, discórdia quântica, acoplamento randômico.

Campos: Teoria quântica da informação ; correlações quânticas.


Table of Contents

1 Introduction 1

2 Classical Information Theory 5

2.1 What is information? . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Measuring Information . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Mutual Information and Other Entropies . . . . . . . . . . . . . . . . 17

2.4.1 Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Quantum Dynamics 31

3.1 Dynamics of Closed Quantum Systems . . . . . . . . . . . . . . . . . 32

3.1.1 Dynamics of Pure and Mixed States . . . . . . . . . . . . . . . 33

3.1.2 Dynamics of One Qubit . . . . . . . . . . . . . . . . . . . . . 41

3.2 Composite Quantum Systems . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Dynamics of Open Quantum Systems . . . . . . . . . . . . . . . . . . 53

3.4 Measurement Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5 Generalized Quantum Dynamics . . . . . . . . . . . . . . . . . . . . . 65

3.6 Decoherence and Environment-Induced Superselection . . . . . . . . . 67

4 Quantum Information Theory 73

4.1 The von Neumann Entropy . . . . . . . . . . . . . . . . . . . . . . . 74

4.2 Entropy at the Interfaces: Preparation and Measurement . . . . . . . 81

4.2.1 The Entropy of Projective Measurements . . . . . . . . . . . . 81

4.2.2 The entropy of preparation . . . . . . . . . . . . . . . . . . . . 83

4.3 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4 Measuring Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.4.1 Correlations in Pure States; Entanglement . . . . . . . . . . . 89

iv


Table of Contents v

4.4.2 Correlations in Arbitrary Bipartite States; Accounting for Mixed

States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.5 Applications of Quantum Correlations . . . . . . . . . . . . . . . . . 99

4.5.1 No-Cloning Theorem . . . . . . . . . . . . . . . . . . . . . . . 100

4.5.2 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . 102

4.5.3 Entanglement Swapping . . . . . . . . . . . . . . . . . . . . . 105

4.5.4 Quantum Dense Coding . . . . . . . . . . . . . . . . . . . . . 106

4.5.5 Quantum Discord as a Resource . . . . . . . . . . . . . . . . . 108

5 The Effect of Random Coupling Constants on Quantum Correla-

tions 110

5.1 Calculation of the Measures of Quantum Correlations . . . . . . . . . 113

5.2 Numerical Results and Discussions . . . . . . . . . . . . . . . . . . . 118

6 Conclusions 129

A Singular Value Decomposition 131

B Operator-Sum Decomposition 135

C Information Theoretic Inequality 138

Bibliography 142


Chapter 1

Introduction

Since the advent of quantum mechanics there has been a huge progress in techno-

logy. Quantum mechanics underpins our modern society in many ways, from all

kinds of electronic devices to how some economic sectors are structured. One of

the new paradigms of technological development is the idea of construct a quantum

computer. A computer which exploits in a fundamental way the unique quantum

behavior. All computers use quantum mechanics in the sense of using quantum

mechanical devices such as the transistor. Nevertheless, when it comes to store and

process the information, our current computers use binary language, for example, ei-

ther with or without electrical current flowing through the transistors. It is precisely

this point that quantum computers intend to extend, by exploiting the superposition

property of quantum systems. This story of quantum computers brings us to an-

other field greatly developed in the past century, the theory of communication. We

will say a few words about information theory in the next chapter. For the moment,

let us comment briefly some chain of events that were important for the realization

that taking into account quantum mechanics in the way we transmit information

can provide more efficient algorithms to some tasks.

Entanglement has been a major concept in modern physics. It is accepted that

the first discussion of this property in a scientific article was in the seminal paper

written by Einstein, Podolsky and Rosen (also known as EPR article) in 1935 [1].

They questioned if the quantum theory was complete or not. As they were prone

to the incompleteness, because in their work this was a consequence if the locality

of quantum theory is assumed, they suggested that it might exist hidden variables

which would make the quantum theory complete. This means that they have pro-

vided an example that seemed to show that quantum mechanics does not respect

1


Chapter 1. Introduction 2

locality. The next important step in this story may be considered the work of John

S. Bell published in 1964 [2]. Bell set up the discussion to the experimental level.

He derived an inequality (called Bell’s inequality) which the supposed (local) hidden

variable theory should satisfy. Even though it might have been unclear up to now,

all these discussions were raised because since the beginning, with the EPR article,

they were dealing with entangled states, those states that seemed to violate locality.

Nobody was able to grasp what was really going on at the time. In 1972 John Clauser

and Stuart Freedman carried out the first experimental test that Bell’s inequality

(and also the Clauser-Horne-Shimony-Holt inequality∗) were, in fact, violated by

quantum mechanics. Since then, there has been a myriad of experimental confirma-

tion of this feature. The violation of Bell-like inequalities by quantum mechanics

means that there is something fundamentally different that cannot be substituted by

the local hidden variable theories, which are the most general notion which preserves

both locality and reality (as defined by EPR). It was then that a careful study of

such states (those which violate the inequalities) began to grow.

In the early 1980s arose the interest in whether it might be possible to use such

quantum effects to signal faster than light. Wootters, Zurek and Dieks [3,4] showed

that this is not possible and it turns out that this is prevented by the no-cloning

theorem, which says that an arbitrary unknown quantum state cannot be perfectly

copied (we will see this in more detail later). This was the first result in quantum

mechanics which had a striking distinct characteristic when contrasted with classical

information, where bits can be easily copied. In fact, the copying availability is

everywhere used in error-correction schemes in classical communication. After that,

the field of quantum information began to grow increasingly faster. Right after the

discovery of the no-cloning theorem, Bennett and Brassard [5] discovered the first

quantum cryptographic protocol, in 1984, representing the beginning of interest in

using quantum systems for security of communication. This naturally leads to the

will to use quantum mechanics to explore other communication concepts, such as

codification, data compression, transmission of information and computation.

With the development of these studies entanglement have found its place. Until

recently, it was believed that entanglement was a necessary resource for a quantum

∗The CHSH inequality is a Bell-like inequality whose violation by quantum mechanics also

means that quantum mechanics cannot be described by local hidden variable theories.


Chapter 1. Introduction 3

protocol be more efficient, if that is even the case, than a similar classical protocol

which solves the same task. Some quantum protocols will be discussed later in this

work, some of which are the quantum teleportation and the super dense coding.

Nevertheless, there has been an increasing interest in what is called mixed-state

computation. In this computation there is no entanglement, or a negligible amount

of entanglement such that it is not the resource that promote the efficiency of the

algorithm. These resources are still somewhat fundamentally quantum, although

there is much more research ahead for clarification of this issue. One example

of another resource, other than entanglement, is quantum discord, which we are

also going to discuss and show an efficient quantum algorithm to solve a task that

there is no known classical algorithm that does so. These quantum correlations are

indeed playing the role of potential resources for speedup computation, justifying

the interest on studying them.

This work has two aims. First, it is structured to provide an introduction to

quantum information theory, beginning with a review of the major concepts of

classical information theory and quantum mechanics, although there were some

issues that were just commented and an appropriate explanation was left behind.

Second, it serves to present an analysis about the behavior of some measures of

quantum correlations when the coupling constant between system and environment

is random.

This work is organized as follows. In Ch. 2 we introduce the classical informa-

tion theory. We begin with a discussion about the concept of information and, from

that, we introduce and study the measure of information of a informational source,

the Shannon entropy. This comprises the Secs. 2.1, 2.2 and 2.3. Then we develop

the other measures of information for two or more sources, the mutual information,

conditional entropy and relative entropy in Sec. 2.4. In Ch. 3 we review the basic

formalism and dynamics of quantum systems, Secs. 3.1 and 3.2, as well as we provide

an introduction to more advanced topics such as open system dynamics, Sec. 3.3,

and measurement dynamics, in Sec. 3.4. In Sec. 3.5 we present a general description

of the quantum dynamics through the so-called quantum operations. Finally, in Sec.

3.6, we discuss two related phenomena, decoherence and environment-induced su-

perselection, using the tools developed in the previous sections. With the basic tools

of classical information theory and quantum mechanics we focus on the quantum


Chapter 1. Introduction 4

information theory in Ch. 4. We begin introducing the von Neumann entropy, the

Shannon-like measure of information for quantum systems, and discuss the infor-

mation at the two basic interfaces of communication, the preparation of quantum

systems and their measurement, thus, comprising Secs. 4.1 and 4.2. In Sec. 4.3

we discuss correlations in quantum mechanics and then, in Sec. 4.4, we develop

some measures of correlations such as entropy of entanglement, for pure states, con-

currence and quantum discord for mixed states. We end the chapter, in Sec. 4.5,

with some notable applications in quantum information theory, the no-cloning the-

orem, quantum teleportation, entanglement swapping, quantum dense coding and

mixed-state quantum computation. In Ch. 5 we study the dynamics of quantum

correlations when the interaction between system and environment is governed by a

random coupling constant. There, we discuss three situations. The first where the

coupling constants are actually constant, the homogeneous environment, and then

two random interactions, one of which with a uniform distribution and the other

with a normal distribution. Finally, we end this work with the conclusions in Ch.

6.


Chapter 2

Classical Information Theory

Information theory is a broad field of science which comprises applied mathe-

matics, electrical engineering and computer science [6]. The landmark work that

may be considered as the starting point of the information theory is the article en-

titled “A Mathematical Theory of Communication”, written by Claude E. Shannon

in 1948 [8]. Shannon was a researcher at the Math Center at the Bell Laboratories,

a division created by the American Telephone and Telegraph (AT&T) specifically

for conducting research and development of communication systems. From a fairly

reasonable set of properties that an appropriate measure of information should have,

he arrived at a function, which he himself recognized as being similar to the sta-

tistical entropy, that properly measures information, choice and uncertainty. This

function is the so-called Shannon entropy.

Information theory is not concerned with the meaning of the information sent,

but with practical issues such as how to send efficiently and reliably a signal through

a noisy channel such that it can be recovered faithfully. The development of this

field enabled the invention of the Internet, which certainly is one of the technolog-

ical achievements that most shaped, and still is powerfully capable of shaping, our

modern society.

The aim of this chapter is to present an introduction to the classical information

theory, going through its fundamental concepts, all of which will be of key impor-

tance when we turn into the quantum information theory. We will not be concerned

with actual applications.

The structure of this chapter is the following. In Sec. 2.1 we will discuss the

concept of information, making the meaning of the word clear when we use it.

Following this, we will reach at an objective meaning of information, thus enabling

5


Chapter 2. Classical Information Theory 6

us to quantify it and use it operationally. Encouraged by this meaning, in Section

2.2, we will study a measure of information supported by it. In Section 2.3 we

will explore the fundamental concept of information theory, the Shannon entropy.

Instead of being concerned with the actual derivation given by Shannon, we will

present it as a definition and work with it in some examples to understand its

meaning. In Section 2.4 we will begin to explore the correlations between two or

more messages. In order to do so we will use other entropies, namely, the joint

entropy, the conditional entropy, the mutual information and the relative entropy.

These entropies rest upon the Shannon entropy and are as important as the Shannon

entropy itself. The quantum analogous of some of these entropies will give us a

measure of quantum correlations.

The presentation is intended to be pedagogical, as a solid understanding of the

concepts of information theory in the classical case is important when one goes to

the quantum case.

This chapter is heavily based on the Ref. [7].

2.1 What is information?

If we wonder about the concept of information for while it is not difficult to realize

that it has many facets. In fact, it is not such a rare insight to realize that define

any abstract noun as precise as possible is by no means an easy task. Semantics,

the field of Linguistics that studies the meaning of words, phrases, et cetera, is a

very interesting subject. Studying how we acquire the meaning of words shows us

that from our early days we learn based on our experience. Our brain is biologically

adapted to learn languages. We observe the behavior of those around us, listen to

those mysterious sounds they make and by simply being there our brain is able to

connect those sounds with the present situation and, even though it is an extremely

complex process, after two or three years we are able to produce, identify precisely

and understand those sounds and their meanings [9]. Based on this phenomenon

the present author believes the best way to define a concept, mainly an abstract

one, is heuristically. We will go through a couple of situations where the idea of

information is applied and then reflect about its different meanings.

Suppose Alice and Bob want to secretly communicate with each other. They are


Chapter 2. Classical Information Theory 7

neighbors and live side by side in such a way that from Alice’s bedroom window

she can see Bob’s bedroom window and vice versa. They have established a secret

code using a lantern. If Alice flashes once, it means that her parents have left

home. If Alice flashes twice, it means that her parents have decided not to go out

anymore. Without minding why they would like to communicate with each other,

this situation brings us the opportunity to explore the concept of information. We

say that after Alice flashed once or twice, Bob has received information from her. So,

one property of informatioin is that it can be transmitted. The code they established

must be known a priori to both Alice and Bob. If Bob didn’t know about the code,

he would have not been able to read the message properly. He simply would not

have understood it. The abstract message, either parents at home or not, had to be

coded in a physical system. In the present case, the flashes of light. Moreover, if it

was not a problem, Alice could have just yelled at him. The very same information

would have been transmitted through another physical system, namely, the sound

waves in the air. Thus, the same information can be stored in different physical

systems.

Suppose now that Alice and Bob have met in the street near their houses and

started a conversation. Alice tells Bob that in a street nearby there was a car which

crashed into a post. We say naturally that Alice had this information and gave

it to Bob. So, now Bob has this information too. Alice neither forgot nor lost

this information in any way, both of them have the information now. Therefore,

one additional and very important property of information is that it can be copied.

Consider now that they have not seen each other for, say, a week. When they meet

again Alice tells Bob the very same story. Alice didn’t forget the information about

the accident; instead, she simply has forgotten she told Bob. But Bob already knew

the story and it was not a new information for him. This is a subtle and fundamental

point now. Suppose that, in this situation, Alice did not tell him the very same story,

instead she told a couple of details she haven’t told him before, and therefore he did

not know about. We would say that he acquired a little amount of information but

not as much as he would have if she had told him from the very beginning, in the

first encounter, the standard story plus the details. Of course at the end we would

agree that he had the same amount of information about the story. The subtlety

here is that we implicitly are measuring the amount of information. One could


Chapter 2. Classical Information Theory 8

even say that he had been listening inattentively or uninterestedly until the moment

when Alice told him something new, unexpected. He, then, might have turned his

attention on her to listen carefully the details which were new to him. This is the key.

We measure information as how unexpected, uncertain, unpredictable something is.

The more uncertain, unexpected, unpredictable something is the more information

we will gain when we acquire that knowledge. We will make these concepts more

precise in the sections to follow.

With this sense, we have a proper lead to an objective (and hence scientifically

useful) definition of information because in order to use it we must have an opera-

tional definition. Something we can manipulate and calculate mathematically.

To summarize this section, we concluded that information is something that can

be coded, transmitted, copied and stored in different ways. The scientific concept

which we will develop from now on will not be concerned with the meaning of the

words or symbols, because this is a subjective property from which we can not make

science. Instead, we will measure information based on unexpectancy, uncertainty

and unpredictability.

2.2 Measuring Information

The focus of this section is to develop a measure of the amount of information.

We concluded the last section realizing that we measure information qualitatively

relating the knowledge of some event as how much unexpected or unpredictable it

is. The most unexpected the event is the more surprised we will be while acquiring

that knowledge and consequently more information we gain.

Suppose that Alice plays the lotto and she is searching for the information of

the lotto’s results. When she gets the information she reads it and it says: The

winning combination is 5, 38, 4, 22. What is important in this sentence is the set

of numbers. After she compared these numbers with the ones in her tickets the two

important events are: W = she has won or L = she has lost. There is a probability

associated with each of these events. From experience we know that the odds of

winning are quite small, and the odds of losing are, therefore, substantially high.


Chapter 2. Classical Information Theory 9

We may assume, for instance, that

p (W ) = 0.000001, (2.1)

p (L) = 0.999999. (2.2)

Now the question is, how much information does Alice get when actually knowing

the lotto result? As we know from our experience that the odds of winning are

small, so does Alice. Therefore she has not much expectation of winning, she will

not learn much upon realizing that she has, indeed, lost. But on the contrary case,

if she wins, it would be quite a surprise for her (as it would be for us). There is a lot

of information in the winning event, to the same extent there is little information in

the losing event. Now we have to find a proper mathematical function to measure

the information. This function must obey some conditions. This function, call it

I (x) , should approach zero for events closed to absolute certainty, and infinity for

events reaching the impossibility. It must also satisfy that I (x1) > I (x2) if the two

events satisfy p (x1) < p (x2), meaning that the less likely event is associated with the

greater information. An information measure that satisfies all these requirements is

the following

I (x) = −K log [p (x)] , (2.3)

where log(x) is the base two logarithm and K is a dimensionless constant which we

will set up to 1 for the moment and later we shall see why this is appropriate. In

our lotto example we have

I (W ) = − log [0.000001] = 19.9315686, (2.4)

I (L) = − log [0.999999] = 1.0× 10−6. (2.5)

As expected by construction, the winning event has more information because it is

more unexpected.

By convention we set the symbol log(x) to stand for base two logarithm. We

could have used any basis for the logarithmic function, though. This would just have

changed the scale of the measure or, in other words, its units. This measure is di-

mensionless and the unit we use is the bit∗. But why would we use the bit instead of

other units? The smallest nontrivial set of events from which we may speak of infor-

mation is the set composed of two symbols, or events. For example, a coin. If we toss

∗If we have chosen a three based logarithm it would be a trit, a four based a quad and so on.


Chapter 2. Classical Information Theory 10

an unbiased coin there is a probability of one half to get either heads or tails. Hence

p (heads) = p (tails) = 1/2. The information associated with each of these events

is I (heads) = I (tails) = 1 bit. Therefore what a bit means is the exact amount of

information necessary to describe the outcome of a coin tossing. This is why we had

set up, by convention, the constant K in Eq. (2.3) to 1: to match up the scale of the

measure with the simplest case. The logarithmic function can again be seen to be

appropriate as it fits our intuition that if we had two coins the amount of information

needed to describe the outcome of both coins would be doubled. For instance, we

would have p (heads,heads) = p (heads,tails) = p (tails,heads) = p (tails,tails) = 1/4

and therefore I (heads,heads) = I (heads,tails) = I (tails,heads) = I (tails,tails) = 2

bits.

Let us now apply this measure in another situation. Consider an unbiased dice.

The possible outcomes of rolling a dice are X = {1, 2, 3, 4, 5, 6}. The probability of

each event is p (x = 1) = · · · = p (x = 6) = 1/6. The amount of information in each

outcome is, therefore,

I (X) = − log (1/6) ≈ 2.584 bits. (2.6)

The above result may seem confusing at first sight. How do we interpret a real

amount of information? The meaning of this result is that we need a string of at

least 3 symbols to code all possible outcomes unambiguously. Using a string of

length 3 we can code the outcomes as: 1 = 001, 2 = 010, 3 = 011, 4 = 100, 5 = 101

and 6 = 110. Note that we are also left with two possible blocks: 000 and 111. Even

though the coding is arbitrary, for any coding of length three, there will remain two

blocks unused.

There is an alternative interpretation for the amount of information measured

in bits: an operationally one. The logical truth values YES and NO can be thought

of as a binary alphabet. This alphabet may be interpreted as the answer (outcome)

of a YES-or-NO question. How many questions do we need to ask in order to know

the outcome of a coin tossing? One question may be: is the result 0? If yes, then we

already know the result. If no, by complementarity we know that the result was 1†.

Therefore, there exists exactly one YES-or-NO question such that when answered

†We could have asked about the outcome 1, i.e., it might exist more than one appropriate

question.


Chapter 2. Classical Information Theory 11

we know the result unambiguously. The existence of exactly one question is directly

associated with the amount of information which is, in this case, 1 bit. In the dice

rolling we have 6 equiprobable outcomes. How many questions do we need to ask

in order to know the outcome of a dice rolling? The first question which comes to

mind is, like before: is the result 3 (or any other)? If the answer is yes then we know

the result. If it is no we don’t. This is a difference between the coin tossing and the

dice rolling. In the coin tossing example, there are only two possible outcomes. So,

one such question is enough to know the outcome with certainty. But this is not

the case in the dice rolling because it does not have only 1 bit of information. Only

with questions of the kind “is the outcome of the dice rolling x ?” we would need

six questions to know the outcome with certainty. Let us try to find a small set of

questions which unambiguously gives the result when answered. Consider the three

following questions:

1. Is the result x even? If YES, then x ∈ {2, 4, 6} = U1, if NO,

then x ∈ {1, 3, 5} = U2.

2. Is the result x strictly greater than 3? If YES, then x ∈ {4, 5, 6} = V1, if NO,

then x ∈ {1, 2, 3} = V2.

3. Is the result x divisible by 3? If YES, then x ∈ {3, 6} = W1, if NO,

then x ∈ {1, 2, 4, 6} = W2.

When these questions are answered they yield precisely the value of x:

• {1} = U2 ∩ V2 ∩W2 ,i.e., 1 = NO/NO/NO;

• {2} = U1 ∩ V2 ∩W2 ,i.e., 1 = YES/NO/NO;

• {3} = U2 ∩ V2 ∩W1 ,i.e., 1 = NO/NO/YES;

• {4} = U1 ∩ V1 ∩W2 ,i.e., 1 = YES/YES/NO;

• {5} = U2 ∩ V1 ∩W2 ,i.e., 1 = NO/YES/NO;

• {6} = U2 ∩ V2 ∩W2 ,i.e., 1 = YES/YES/YES.

Note again that there are two combinations that are not used, YES/NO/YES (U1∩

V2 ∩W1 = ∅ ) and NO/YES/YES (U2 ∩V1 ∩W1 = ∅ ). So we used six combinations


Chapter 2. Classical Information Theory 12

of the binary symbols YES and NO out of eight possible ones. This is why the

informational content of the dice rolling is between 2 and 3. If the source was given

by eight equiprobable symbols Y = {1, 2, 3, 4, 5, 6, 7, 8} = X∪{7, 8} we would have 3

bits of information because each symbol would have probability 1/8. What does the

difference I (Y )−I (X) = 0.416 mean? Well, it is exactly the amount of information

which we gain from source Y when it is no longer equiprobable because its two

last symbols have now (from the point of view of the source X) zero probability

of occurrence. This excess of probability are redistributed among the other six

symbols to form a new source of equiprobable six symbols, namely, X. Each of

these six symbols will have their probabilities increased by 8/6 which in terms of

information means I (∆Y ) = − log (8/6) = −0.416. After this re-arrangement the

new source has information I (X) = I (Y ) + I (∆Y ) = 3− 0.416 = 2.584 bits. This

supports the interpretation of information as being related to certainty. If there

are two symbols that do not show up anymore then we are more certain about the

possible outcomes (because we have eliminated two possibilities) and therefore less

information is associated.

Summarizing this section, we have seen that the simplest alphabet is composed

by the binary digits (bits) 0 and 1 (or YES and NO, or whichever). The measure

of information I (x) is dimensionless and its common unit is the bit. The amount

of information of a given event, as measured by I (x), is a positive real number. In

terms of coding the information in bits, the shortest string of bits to code an event,

or a message, of information I (x) is given by the smallest n such that n ≥ I (x).

2.3 Shannon Entropy

We have seen how to measure the amount of information of some events. Even

though it may seem that this is the central concept in information theory the fact is

that it is not. The central concept is entropy, more precisely the Shannon entropy.

In his landmark paper [8], Shannon was interested mainly in two situations. Given

a source of information, what is the efficient way of encoding this information in

order to transmit it through a channel. The second was, given a noisy channel (one

that may produce error in the transmission), what is the largest rate of information

that can be sent through the channel such that the information can be recovered


Chapter 2. Classical Information Theory 13

reliably after the transmission. These motivated him to seek an improved measure

of information. From a small set of hypotheses that this measure should obey he

deduced mathematically a function which he called entropy. We are not going to

discuss the hypotheses made by Shannon or how he arrived at the expression of this

entropy. Instead, we are going to define it and apply it to some examples in order

to grasp its meaning because this is what is important when we turn our attention

to quantum information theory in Ch. 4.

It turns out that the Shannon entropy (also called source entropy∗) is given by

the average of the information measure we have been studying so far, i.e.,

H (X) = 〈I (x)〉 = −
∑
x∈X

p (x) log p (x) . (2.7)

The entropy of a source is, therefore, the average amount of information per source

symbol. If the source is equiprobable then the source entropy reduces to the infor-

mation, i.e.,

H (X) = −
∑
x∈X

p (x) log p (x) = −
N∑
i=1

1

N
log

1

N
= logN. (2.8)

This means that we have been working with the Shannon entropy all along but in

a special case. If we use the base two logarithm then the unit of the entropy is

bit/symbol.

The entropy of a dice roll is

H (X) = log 6 = 2.584 bit/symbols, (2.9)

which is the same as the amount of information found in Eq. (2.6) because of

the equiprobability of the events, as stated above. Let us consider that we throw

two dices in a row and we want to study the sum of the two outcomes. Hence

the outcomes of this new source are X = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Table 2.1

accounts for the probabilities of each of these events. With the data of the table we

can evaluate the source entropy, which is found to be H (X) = 3.274 bits/symbol.

This means that, on average, each event can be described through a number of bits

between 3 and 4, which is expected since we have 11 symbols to represent and 4

bits ↔ 24 = 16 > 11 > 8 = 23 ↔ 3 bits. As physically there is no such thing of

∗The concept of a source is going to be addressed later on. For now it is enough to say that a

source is a set of events (or symbols) with a probability distribution for them.


Chapter 2. Classical Information Theory 14

Sum of the dices Probability p (x) −p (x) log p (x)

2 = 1 + 1 0.027 777 778 0.143 609 03

3 = 1 + 2 = 2 + 1 0.055 555 556 0.231 662 5

4 = 2 + 2 = 3 + 1 = 1 + 3 0.083 333 333 0.298 746 88

5 = 4 + 1 = 1 + 4 = 3 + 2 = 2 + 3 0.111 111 111 0.352 213 89

6 = 5 + 1 = 1 + 5 = 4 + 2 = 2 + 4 = 3 + 3 0.138 888 889 0.395 555 13

7 = 6 + 1 = 1 + 6 = 5 + 2 = 2 + 5 = 4 + 3 = 3 + 4 0.166 666 667 0.430 827 08

8 = 6 + 2 = 2 + 6 = 5 + 3 = 3 + 5 = 4 + 4 0.138 888 889 0.395 555 13

9 = 6 + 3 = 3 + 6 = 5 + 4 = 4 + 5 0.111 111 111 0.352 213 89

10 = 6 + 4 = 4 + 6 = 5 + 5 0.083 333 333 0.298 746 88

11 = 6 + 5 = 5 + 6 0.055 555 556 0.231 662 5

12 = 6 + 6 0.027 777 778 0.143 609 03

Table 2.1: Probabilities of the summation of the outcomes of rolling two dices. This

table was trascripted from Ref. [7].

rational binary symbols we need four bits to code the outcomes of this source. As

in the case considered in Sec. 2.2 we will be left with possible coding blocks unused

whichever coding we perform.

There is a nice operational interpretation of the Shannon entropy which we will

address in the following [10], but a small reflection at this point is worthy.

As we have discussed in Sec. 2.1, for all practical purposes we don’t have interest

in the meaning of the messages. We want to study the properties of messages in

an objective way. In fact, we have already been doing this when we treated the

coin tossing and the dice rolling. Let us think about the kinds of messages we are

used to in daily life. Texts in any human language are a complex structure made

of sentences, words, letters, syntactic rules, et cetera. Furthermore, exactly because

of this complex structure, the probability of a given letter occur within a string of

letters (and spaces) depends on the word it finds itself in. Therefore, in order to

measure the information that a new letter would have if it filled in the next position

in the string we would need to consider too many joint and conditional probabilities.

Let us forget that these complications exist and study a relatively simple source and

see what we can learn from it.

Suppose we have an alphabet with N letters (or, abstractly, a set with N sym-

bols). Let us consider a machine that prints out letters in a sheet of paper in a


Chapter 2. Classical Information Theory 15

sequential manner. The probability pi of the letter xi occur in a given position of

the sequence is completely uncorrelated with what has already been written or will

be written. This machine is called a signal source without memory and the set

{xi, pi} is called signal ensemble† (see Fig. 2.1). From now on a message is going to

be a definite sequence of n letters generated by this kind of machine.

Figure 2.1: Representation of the signal source with signal ensemble {xi, pi}. This

figure was copied from Ref. [10]

Alice uses this printing machine to send messages to Bob. She doesn’t send the

message as a whole but letter by letter. Therefore, Bob does not know what letter

may come next but he does know the signal ensemble. Remember our discussion in

Sec. 2.1, Bob did know the code.

Consider that Alice prints out a sequence of n letters. If the alphabet contains N

different letters, then there are Nn such sequences. The relative frequency of a letter

xi to appear in these sequences is of the order npi. With a large enough sequence the

frequency of a letter xi may be considered ni = npi. A sequence such as xixi . . . xi is

not excluded but has a probability pni , which is very small since 0 ≤ pi ≤ 1 and n is

large. So, when n is sufficiently large, Bob may assume that he will only receive the

sequences which contain the letters xi’s with the frequencies ni’s. These sequences

are called typical sequences. A fairly reasonable question is: how many of these

sequences are there? Well, there are n! ways to arrange the n letters. Permutations

of the same letter in a sequence does not change it. As there are ni occurrences of

the letter xi we need to remove the ni! possible permutations. Hence the number of

typical sequences is

Zn =
n!

n1!n2! . . . nN !
. (2.10)

†This set is equivalently represented by a random variable X.


Chapter 2. Classical Information Theory 16

Using the Stirling’s formula log (n!) ≈ n log n−n, which is valid when n is large, we

obtain for the logarithm of Zn

logZn ≈ n log n− n−
N∑
i=1

(ni log ni − ni) = −n
N∑
i=1

pi log pi. (2.11)

If we divide this by n and take the limit n→∞ we obtain the Shannon entropy

H (X) := lim
n→∞

1

n
logZn = −

N∑
i=1

pi log pi. (2.12)

Or equivalently

Zn = 2nH(X)(asymptotically). (2.13)

Since many (few) possibilities reflect a large (small) measure of a priori uncer-

tainty for Bob, H (X) from Eq. (2.12) is a measure of the mean a priori uncertainty

of a character which is received by Bob. Then H is at the same time the average

information which Bob receives per transmitted character. This result is a simplified

way of stating the Shannon’s source coding theorem.

As we said previously the simplest source is composed by two bits. Let us

calculate the Shannon entropy for this source. The Shannon entropy is simply

H (X) = −p log p− (1− p) log (1− p) , (2.14)

where p is the probability to get the bit 0, for example. This function is also called

binary entropy function and it is plotted in Fig. 2.2.

We see that the Shannon entropy has a maximum, when the two bits have the

same probability of occurrence, namely, 1/2, and it has two minima, when one of

the bits has certainty of occurrence. The maximum situation is the one which we

are more uncertain about the next symbol given by the source. Exactly because

there is the same probability to occur either 0 or 1, there is no way to discriminate.

As the probability of one symbol is increasing, the entropy is reduced because we

know that one symbol occurs more frequently than the other. The extreme case is

when the source just prints out one symbol. In this case we are completely certain

of the next symbol and therefore the entropy reaches zero.

In this section we studied the central concept of information theory, the Shannon

entropy. The Shannon entropy is interpreted as the average amount of information

per symbol associated with a given source. Next we are going to extend this no-

tion and define entropies that enable the study of correlation between two or more

sources.


Chapter 2. Classical Information Theory 17

 0

 0.1

 0.2

 0.3

 0.4

 0.5

 0.6

 0.7

 0.8

 0.9

 1

 0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1

S
ha

nn
on

 e
nt

ro
py

probability p

Figure 2.2: Shannon entropy for a 1-bit source.

2.4 Mutual Information and Other Entropies

We have been discussing about events, messages, random variables, information

and entropy. Both information and entropy are quantities which depend on one

message (or one random variable). In this chapter we shall deal with other entropies,

all of which tell us about the correlations between two (or more) messages. We

shall answer questions like: How much information do these two messages share in

common? How much information do I gain reading this message given I’ve read

the other? As we shall see in Ch. 4 these entropies play a central role in quantum

information theory. We will consider a concrete example and define these entropies

as we go through.

Consider two event sources related with a public company [7]. The first event

source is about three possible conclusions for the quarterly sales report, namely

x ∈ X = {good, same, bad} = {x1, x2, x3}, meaning that the results are in excess

of the predictions (good), or on target (same), or under target (bad). The second

event source is the company’s stock value, as reflected by the stock exchange, with

y ∈ Y = {up, steady, down} = {y1, y2, y3}. Then assume that there exists some form

of correlation between the company’s result (x ∈ X) and its stock value (y ∈ Y ). A


Chapter 2. Classical Information Theory 18

y1 (up) y2 (steady) y3 (down)

p(xi, yj) p 0.300 0.500 0.200
∑

=

x1 (good) 0.160 0.075 0.075 0.010 0.160

x2 (same) 0.750 0.210 0.400 0.140 0.750

x3 (bad) 0.090 0.015 0.025 0.050 0.090∑
= 0.300 0.500 0.200 1.000

y1 (up) p(y1, xi) y2 (steady) p(y2, xi) y3 (down) p(y3, xi)

p(yj, xi) p 0.300 = p(y1|xi)p(xi) 0.500 = p(y2|xi)p(xi) 0.200 = p(y3|xi)p(xi)

x1 (good) 0.160 0.469 0.075 0.469 0.075 0.063 0.010

x2 (same) 0.750 0.280 0.210 0.533 0.400 0.187 0.140

x3 (bad) 0.090 0.167 0.015 0.278 0.025 0.556 0.050∑
= 0.300 0.500 0.200 1.000

x1 (good) p(x1, yj) x2 (same) p(x2, yj) x3 (bad) p(x3, yj)

p(xi, yj) p 0.160 = p(x1|yj)p(yj) 0.750 = p(x2|yj)p(yj) 0.090 = p(x3|yj)p(yj)

y1 (up) 0.300 0.250 0.075 0.700 0.210 0.050 0.015

y2 (steady) 0.500 0.150 0.210 0.800 0.400 0.050 0.025

y3 (down) 0.200 0.010 0.015 0.700 0.100 0.250 0.050∑
= 0.300 0.750 0.090 1.000

Table 2.2: Numerical example of the quarterly sales report and the company’s stock

value sources. This table was trascripted from Ref. [7].

numerical example of joint and conditional probability data, p (xi, yj) , p (xi|yj) and

p (yj|xi) is shown in Table 2.2.

The first group of numerical data, shown at the top of Table 2.2, corresponds to

the joint probabilities p (xi, yj). Summing up the data by rows (i) or by columns (j)

yields the probabilities p (xi) or p (yj), respectively. The double checksum (bottom

and right) which yields unity through summing by row or by column is also shown for

consistency. The two other groups of numerical data in Table 2.2 correspond to the

conditional probabilities p (xi|yj) and p (yj|xi). These are calculated through Bayes’

rule p (xi|yj) = p (xi, yj) /p (yj) and p (yj|xi) = p (xi, yj) /p (xi). The intermediate

columns providing the data p (xi|yj) p (yj) = p (yj|xi) p (xi) = p (xi, yj) and their

checksums by column are shown in the table for consistency.

We define the joint entropy H (X, Y ) associated with the joint distribution

p (x, y) ,with x ∈ X and y ∈ Y , as

H (X, Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x, y) . (2.15)

It is easily recognizable to be a natural extension of the Shannon entropy for a joint


Chapter 2. Classical Information Theory 19

probability distribution. This entropy represents the average information derived

from joint events occurring from two sources X and Y . The unit of H (X, Y ) is

bit/symbol.

We then define the conditional entropy H (X|Y ) through

H (X|Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x|y) . (2.16)

The conditional entropy H (X|Y ) corresponds to the average information conveyed

by the conditional probability distribution p (x|y). Put simply, H (X|Y ) represents

the information we learn from source X given the information we have from source

Y . Its unit is also bit/symbol.

Note that in the joint and conditional entropy definitions the two-dimensional

averaging over the event space {X, Y } consistently involves the joint distribution

p (x, y).

The conditional entropy H (Y |X) is defined in the same way, namely,

H (Y |X) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (y|x) . (2.17)

As the conditional probabilities p (x|y) and p (y|x) are in general different, the two

conditional entropies will also be different.

If the two sources are statistically independent then p (x, y) = p (x) p (y). Con-

sequently, this implies that

H (X, Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x, y)

= −
∑
x∈X

p (x) log p (x)−
∑
y∈Y

p (y) log p (y) = H (X) +H (Y ) . (2.18)

Moreover p (x|y) = p (x), then

H (X|Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x|y)

= −
∑
x∈X

p (x) log p (x)

(∑
y∈Y

p (y)

)
= H (X) (2.19)

and

H (Y |X) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (y|x)

= −
∑
y∈Y

p (y) log p (y)

(∑
x∈X

p (x)

)
= H (Y ) . (2.20)


Chapter 2. Classical Information Theory 20

These results match with our intuition that if the two sources are statistically in-

dependent the knowledge of one of them does not provide any advance knowledge

about the other.

In the general case where the sources are not necessarily statistically independent

we find a relation between the joint and conditional entropies given by

H (X, Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x, y)

= −
∑
x∈X

∑
y∈Y

p (x, y) log p (x|y)−
∑
y∈Y

p (y) log p (y)

= H (X|Y ) +H (Y ) , (2.21)

or equivalently,

H (X, Y ) = −
∑
x∈X

∑
y∈Y

p (x, y) log p (x, y)

= −
∑
x∈X

∑
y∈Y

p (x, y) log p (y|x)−
∑
x∈X

p (x) log p (x)

= H (Y |X) +H (X) . (2.22)

These expressions may be easier to memorize under the forms

H (X|Y ) =H (X, Y )−H (Y ) , (2.23)

H (Y |X) =H (X, Y )−H (X) , (2.24)

which state that, given a source X (Y ), any advance knowledge from the other

source Y (X) reduces the joint entropy H (X, Y ) by the net amount H (Y ) (H (X)),

respectively. It will be shown in Sec. 2.4.1 that all these quantities are positive. In

other words, the prior information one may gain from a given source is made at

the expense of the information available from the other source, unless the two are

statistically independent.

We can illustrate the above properties through our stock-exchange probability

distributions’ data from Table 2.2. We want, here, to determine how the average in-

formation from the company’s sales, H (X), is affected by that concerning the stocks,

H (Y ), and vice versa. The computations of H (X) , H (Y ) , H (X, Y ) , H (X|Y ) and

H (Y |X) are detailed in Fig. 2.3. It is seen from the table that the results and

stock entropies compute to H (X) = 1.046 and H (Y ) = 1.485, respectively (for


Chapter 2. Classical Information Theory 21

ui = p(xi)

vj = p(yj)

ui −ui log ui vj −vj log vj

x1(good) 0.160 0.423 y1(up) 0.300 0.521

x2(same) 0.750 0.311 y2(steady) 0.500 0.500

x3(bad) 0.090 0.313 y3(down) 0.200 0.464

H (X) = 1.0469 H (Y ) = 1.4855

H (X) + H (Y ) = 2.5324

uij = p(xi, yj)

ui1 −ui1 log ui1 ui2 −ui2 log ui2 ui3 −ui3 log ui3

x1 0.075 0.280 0.075 0.280 0.010 0.066

x2 0.210 0.473 0.400 0.529 0.140 0.397

x3 0.015 0.091 0.025 0.133 0.050 0.216∑
= 0.844

∑
= 0.942

∑
= 0.680

H (X, Y ) = 2.4657

vji = p(yj|xi)

v1i ui1 −ui1 log v1i v2i ui2 −ui2 log v2i v3i ui3 −ui3 log v3i

x1 0.469 0.075 0.082 0.469 0.075 0.082 0.063 0.010 0.040

x2 0.280 0.210 0.386 0.533 0.400 0.363 0.187 0.140 0.339

x3 0.167 0.015 0.039 0.278 0.025 0.046 0.556 0.050 0.042∑
= 0.506

∑
= 0.491

∑
= 0.421

H (Y |X) = 1.4188

wij = p(xi|yj)

w1j u1j −ui1 logw1j w2j u2j −ui2 logw12 w3j u3j −u31 logw3j

y1 0.250 0.075 0.150 0.700 0.210 0.108 0.050 0.015 0.065

y2 0.150 0.075 0.205 0.800 0.400 0.129 0.050 0.025 0.108

y3 0.050 0.010 0.043 0.700 0.140 0.072 0.250 0.050 0.100∑
= 0.398

∑
= 0.309

∑
= 0.421

H (X|Y ) = 0.9802

H (X, Y ) = H(Y |X) + H (X)

2.4657 = 1.4188+ 1.0469

= H (X|Y ) + H (Y )

= 0.9802+ 1.4855

qwe

Table 2.3: Entropies for stock-exchange example. This table was trascripted from

Ref. [7].


Chapter 2. Classical Information Theory 22

easier reading, we omit here the bit/symbol units). The joint entropy is found to be

H (X, Y ) = 2.466, which is lower than the sum H (X)+H (Y ) = 2.532. This proves

that the two sources are not independent, namely, that they have some information

in common. We find that the conditional entropies satisfy

H (Y |X) = 1.418 < H (Y ) = 1.485, (2.25)

H (X|Y ) = 0.980 < H (X) = 1.046, (2.26)

or equivalently, using four decimal places, for accuracy (see Fig. 2.3)

H (Y )−H (Y |X) = 1.4855− 1.4188 = 0.0667, (2.27)

H (X)−H (X|Y ) = 1.0469− 0.9802 = 0.0667. (2.28)

These two results mean that the prior knowledge of the company’s stocks contain

an average of 0.0667 bit/symbol of information on the company’s quarterly result,

and vice versa. As we shall see below, the two differences above are always equal

and they are called mutual information. Simply put, the mutual information is the

average information that two sources share in common.

We can define the mutual information of two sources X and Y as the bit/symbol

quantity

H (X;Y ) =
∑
x∈X

∑
y∈Y

p (x, y) log
p (x, y)

p (x) p (y)
. (2.29)

We may note the absence of a minus sign in the above definition, unlike in H (X, Y ),

H (Y |X) and H (X|Y ). Also note the “;” separation, which distinguishes mutual

information from joint entropy H (X, Y ). Mutual information is also often referred

to in the literature as I (X;Y ) instead of H (X;Y ).

Since the logarithm argument is unity when the two sources are statistically

independent, we immediately observe that the mutual information is equal to zero in

this case. This reflects the fact that independent sources do not have any information

in common.

It is pertinent, now, to derive some formulas which relate these different en-

tropies. Using the properties of the logarithmic function, the Bayes’ rule and the

relations
∑

x p (x, y) = p (y) and
∑

y p (x, y) = p (x) for the joint probability distri-


Chapter 2. Classical Information Theory 23

bution, the mutual information can be written in the following ways

H (X;Y ) =
∑
x∈X

∑
y∈Y

p (x, y) log
p (x, y)

p (x) p (y)

= −
∑
x∈X

p (x) log p (x) +
∑
x∈X

∑
y∈Y

p (x, y) log p (x|y)

= H (X)−H (X|Y ) , (2.30)

H (X;Y ) =
∑
x∈X

∑
y∈Y

p (x, y) log
p (x, y)

p (x) p (y)

= −
∑
y∈Y

p (y) log p (y) +
∑
x∈X

∑
y∈Y

p (x, y) log p (y|x)

= H (Y )−H (Y |X) (2.31)

and

H (X;Y ) = H (X) +H (Y )−H (X, Y ) , (2.32)

where in the last formula one can use either Eq. (2.30) or Eq. (2.31) together with

Eqs. (2.23). The first two equalities above confirm the observation derived from

our previous numerical example. They can be interpreted according to the following

statement: mutual information is the reduction of uncertainty in X that we get from

the knowledge of Y (and vice versa).

The last equality, as rewritten under the form

H (X, Y ) = H (X) +H (Y )−H (X;Y ) , (2.33)

shows that the joint entropy of two sources is generally less than the sum of the

source entropies. The difference is the mutual information that the sources have in

common, which reduces the net uncertainty or joint entropy.

Finally, we note from the Eqs. (2.30), (2.31) and (2.32) that the mutual infor-

mation is symmetrical with respect to its arguments, namely, H(X;Y ) = H(Y ;X),

as is expected from its very meaning.

The different entropies introduced up to this point may seem a bit abstract

and their different relations apparently is not very practical to memorize. But the

situation becomes different after we draw an analogy with the property of ensembles.

Consider, indeed, two ensembles, called A and B. The two ensembles may be

united to form a whole, which is denoted as F = A ∪ B (A union B). The two


Chapter 2. Classical Information Theory 24

ensembles may or may not have elements in common. The set of common elements is

called G = A∩B (A intersection B). The same definitions of union and intersection

apply to any three ensembles A, B, and C. Fig. 2.3 shows the Venn diagram

representations of such ensemble combinations. In the case of two ensembles, there

exist four subset possibilities, as defined by their elements’s properties:

• Elements common to A or B: A ∪B,

• Elements common to A and B: A ∩B,

• Elements from A and not B: A ∩ ¬B,

• Elements from B and not A: B ∩ ¬A.

Figure 2.3: Venn diagram representation of two (top) and three (bottom) ensem-

bles.This figure has been taken from Ref. [7].

In the conventional notations shown at right, we see that the symbol ∪ stands for

a logical OR, the symbol ∩ stands for a logical AND, and the symbol ¬ stands for a


Chapter 2. Classical Information Theory 25

logical NO. These three different symbols, which are also called Boolean operators∗,

make it possible to perform various mathematical computations in the field called

Boolean logic. In the case of three ensembles A, B, C, we observe that there exist

many more subset possibilities (e.g., A∩B∩C, A∩B∩¬C). The interest of the above

visual description with the Venn diagrams is the straightforward correspondence

with the various entropies that have been introduced. Based on Eqs. (2.18), (2.19),

(2.20), (2.23), (2.30), (2.31) and (2.32) we can draw a Venn diagram which respects

them and see that the following equivalences hold (compare Fig. 2.3 with Fig. 2.4):

H (X, Y )↔ H (X ∪ Y )

H (X;Y )↔ H (X ∩ Y )

H (X|Y )↔ H (X ∩ ¬Y )

H (Y |X)↔ H (¬X ∩ Y )

(2.34)

The first equivalence in Eq. (2.34) means that the joint entropy of two sources is

the entropy of the source defined by their combined events. The second equivalence

in Eq. (2.34) means that the mutual information of two sources is the entropy of

the source containing the events they have in common. The last two equivalences

in Eq. (2.34) provide the relationship with the conditional entropy. For the case

of the conditional entropy of a source X given the information on a source Y ,

the conditional entropy is given by the contributions of all the events belonging

to X but not to Y . Fig. 2.4 illustrates all the above logical equivalences through

Venn diagrams, using up to three sources. Considering the two-source case, we

can immediately visualize from Fig. 2.4 to which subsets the differences H (X) −

H (X;Y ) and H (Y )−H (X;Y ) actually correspond. Given the identities listed in

Eq. (2.34), we can call these two subsets H (X|Y ) and H (Y |X), respectively, which

proves the previous point. We also observe from the Venn diagram that H (X|Y ) ≤

H (X) and H (Y |X) ≤ H (Y ), with equality if the sources are independent.

The above property can be summarized by the statement according to which

conditioning reduces entropy. A formal demonstration, using the concept of “relative

entropy” is provided later.

The three-source case, as illustrated in Fig. 2.4, is somewhat more tricky because

∗These are not exactly the symbols which usually represent the Boolean operators but this is

not relevant to us.


Chapter 2. Classical Information Theory 26

Figure 2.4: Venn diagram representation for the different entropies.This figure has

been taken from Ref. [7].

it generates more complex entropy definitions with three arguments X, Y , and

Z. Conceptually, defining joint or conditional entropies and mutual information

with three (or more) sources is not that difficult. Considering the joint probability

p (x, y, z) for the three sources, we can indeed generalize the previous two-source

definitions according to the following:

H (X, Y, Z) = −
∑
x∈X

∑
y∈Y

∑
z∈Z

p (x, y, z) log p (x, y, z) , (2.35)

H (X;Y ;Z) = +
∑
x∈X

∑
y∈Y

∑
z∈Z

p (x, y, z) log
p (x, y, z)

p (x) p (y) p (z)
, (2.36)

H (Z|X, Y ) = −
∑
x∈X

∑
y∈Y

∑
z∈Z

p (x, y, z) log p (z|x, y) , (2.37)

H (X, Y |Z) = −
∑
x∈X

∑
y∈Y

∑
z∈Z

p (x, y, z) log p (x, y|z) . (2.38)

These four definitions correspond to the joint entropy of the three sources X, Y , Z

(Eq. (2.35)), the mutual information of the three sources X, Y , Z (Eq.(2.36)), the


Chapter 2. Classical Information Theory 27

entropy of source Z given the known entropy of X, Y (Eq. (2.37)), and the joint

entropy of X, Y given the known entropy of Z (Eq. (2.38)). The last two definitions

are seen to involve conditional probabilities of higher orders, namely, p (z|x, y) and

p (x, y|z), which are easily determined from the generalization of Bayes’ rule †. Other

entropies of the type H (X;Y |Z) and H (X|Y ;Z) are more tricky to determine

from the above definitions. But we can resort in all confidence to the equivalence

relations and the corresponding two-source or three-source Venn diagrams shown

in Fig. 2.4. Indeed, a straightforward observation of the diagrams leads to the

following correspondences:

H (X;Y |Z) = H (X;Y )−H (Z) , (2.39)

H (X|Y ;Z) = H (X)−H (Y ;Z) . (2.40)

Finally, the Venn diagrams (with the help of Eq. (2.34)) make it possible to establish

the following properties for H (X, Y |Z) and H (X|Y, Z). The first is

H (X, Y |Z) = H (X|Z) +H (Y |X,Z) , (2.41)

which is easy to memorize if a condition |Z is applied to both sides of the definition

of joint entropy, H (X, Y ) = H (X) +H (Y |X). The second is

H (X, Y |Z) = H (Y |Z) +H (X|Y, Z) , (2.42)

comes from the permutation in Eq. (2.41) of the sources X, Y , since the joint

entropy H (X, Y ) is symmetrical with respect to its arguments. The lesson learned

from using Venn diagrams is that there is, in fact, little to memorize, as long as we

are allowed to make drawings. The only general rule to remember is:

H (U |Z) is equal to the entropy H (U) defined by the source U (for instance,

U = X, Y or U = X;Y ) minus the entropy H (Z) defined by the source Z, the

reverse being true for H (Z|U). But the use of Venn diagrams require us not to

forget the unique correspondence between the ensemble or Boolean operators (∪ ∩

¬) and the separators (, ; |) in the entropy-function arguments.

†As we have

p (x, y, z) = p (z|x, y) p (x) p (y)→ p (z|x, y) = p (x, y, z) / [p (x) p (y)]

and

p (x, y, z) = p (x, y|z) p (z)→ p (x, y|z) = p (x, y, z) /p (z) .


Chapter 2. Classical Information Theory 28

2.4.1 Relative Entropy

Relative entropy is our last word on the different kinds of (classical) entropy in

this work. It is related to the concept of distance between two probability distribu-

tions. It is not strictly a distance because it is not symmetrical, for instance‡. The

actual usefulness of this entropy can only be appreciated when one studies advanced

topics in information theory. For us it is going to be useful as a preparation for the

quantum relative entropy in Ch. 4 and to prove some inequalities about the other

entropies.

We shall introduce the relative entropy between two probability distribution

functions, which is also called the Kullback-Leibler (KL) divergence. Consider two

probability distribution functions, p (x) and q (x), where the argument x belongs to

a single source X. The relative entropy is denoted by D [p (x) ||q (x)] and is defined

as follows:

D [p (x) ||q (x)] =

〈
log

p (x)

q (x)

〉
p

=
∑
x∈X

p (x) log
p (x)

q (x)
. (2.43)

To show that the relative entropy is nonnegative we will need the following

inequality for logarithmic functions log (x) ln (2) = ln (x) ≤ x − 1, or in a more

convenient way

− log (x) ≥ 1− x
ln 2

. (2.44)

Hence it follows

H (p||q) = −
∑
x∈X

p (x) log
q (x)

p (x)

≥ 1

ln 2

∑
x∈X

p (x)

(
1− q (x)

p (x)

)
=

1

ln 2

∑
x∈X

(p (x)− q (x)) = 0. (2.45)

This inequality is called Gibbs’ inequality. The equality only holds when the two

distributions are the same, i.e., p (x) = q (x).

An important particular case occurs when the distribution q (x) is a uniform

distribution. If the source X has N events, the uniform probability distribution is

‡It also does not satisfy the triangle inequality. But it does satisfy the non-negativity as we

shall show.


Chapter 2. Classical Information Theory 29

thus given by q (x) = 1/N . Replacing this definition in Eq. (2.43) yields

D [p (x) ||q (x)] =
∑
x∈X

p (x) log
p (x)

1/N

= logN
∑
x∈X

p (x) +
∑
x∈X

p (x) log p (x) = logN −H (X) . (2.46)

Since the distance D [p||q] is always nonnegative, it follows from the above that

H (X) ≤ logN . This result shows that the entropy of a source X with N elements

has logN for its upper bound, which (in the absence of any other constraint) rep-

resents the entropy maximum. Assume next that p and q are joint distributions of

two variables x, y. Similarly to the definition in Eq. (2.43), the relative entropy

between the two joint distributions is

D [p (x, y) ||q (x, y)] =

〈
log

p (x, y)

q (x, y)

〉
p

=
∑
x∈X

∑
y∈Y

p (x, y) log
p (x, y)

q (x, y)
. (2.47)

The relative entropy is also related to the mutual information. Indeed, recalling

the definition of mutual information (cf. Eq. (2.29)) we get

H (X;Y ) =
∑
x∈X

∑
y∈Y

p (x, y) log
p (x, y)

p (x) p (y)
= D [p (x, y) ||p (x) p (y)] , (2.48)

which shows that the mutual information between two sources X, Y is the relative

entropy (or KL divergence) between the joint distribution p (x, y) and the distribu-

tion product p (x) p (y). Since the relative entropy is always nonnegative, it follows

that mutual information is always nonnegative. Indeed, recalling the Eqs. (2.30)

and (2.31) together with the Gibbs’ inequality yields

H (X;Y ) = H (X)−H (X|Y ) = H (Y )−H (Y |X) ≥ 0, (2.49)

which thus implies the two inequalities

H (X|Y ) ≤H (X) , (2.50)

H (Y |X) ≤H (Y ) . (2.51)

The above result can be summarized under the fundamental conclusion, which has

already been established: conditioning reduces entropy. Thus, given two sources X

and Y , the information we obtain from source X, given the prior knowledge of the

information from Y , is less than or equal to that available from X alone, meaning

that entropy has been reduced by the fact of conditioning. The strict inequality


Chapter 2. Classical Information Theory 30

applies in the case where the two sources have nonzero mutual information. If the

two sources are disjoint, or made of independent events, then the equality applies,

and conditioning from Y has no effect on the information of X.

Even though we are not going to reach the actual interpretation of the relative

entropy we are going to state it [11]: The relative entropyD [p (x) ||q (x)] is a measure

of the information lost when the probability distribution q is used to approximate

the distribution p.


Chapter 3

Quantum Dynamics

Quantum mechanics is one of the greatest theoretical developments ever made

by human beings. It dictates how our subworld behaves. By subworld we mean

the world beneath our world, the world of invisible (to the naked eye) entities,

the world which goes beyond our senses. The study of this invisible subworld has

brought almost every kind of technology we use today and any which might be in

ongoing process of development.

Since the beginning of quantum mechanics until nowadays there are some re-

searchers who are interested in its interpretation. There are many interpretations

that have been put forth, the most famous being the Copenhagen interpretation for-

mulated by Bohr and Heisenberg. Among these interpretations there is one that can

be viewed as minimal, in the sense that it provides a concrete-based interpretation

for the mathematical objects of the theory. This is the statistical (or ensemble) in-

terpretation [12] which takes the Born’s rule to its fullest extent. The wavefunction

is viewed as a mathematical tool to describe statistically the possible outcomes of

an identically prepared ensemble of quantum systems. This interpretation is older

than of what we may call today the operational interpretations. An operational

interpretation attempts to interpret the properties of quantum systems and math-

ematical objects of the theory in terms of their practical results, in the same way

as in the statistical interpretation. The statistical interpretation touches basically

the interpretational problem of the wavefunction but, on the other hand, does not

say a word with respect to other fundamental concepts of the theory such as entan-

glement. From this foundational point of view, quantum information theory sheds

light on new ways to understand such concepts. Quantum entanglement is seen

as a resource which enables some efficient algorithms to solve tasks in a quantum

31


Chapter 3. Quantum Dynamics 32

computer.

In Ref. [13] one may find a recent poll carried out in a 2011 conference on quan-

tum foundations presenting the viewpoint of the participants about foundational

issues in quantum theory. In this poll, it can be seen how quantum informational

concepts merged with quantum mechanics. Some questions asked to the partici-

pants that manifest this are: “What about quantum information?”, whose most

voted answer was “It’s a breath of fresh air for quantum foundations”; “When will

we have a working and useful quantum computer?”, whose most voted answer was

“In 10 to 25 years”; “What interpretation of quantum states do you prefer?”, where

the epistemic/informational interpretation is one that received considerable votes;

and finally “What is your favorite interpretation of quantum mechanics”, with the

information-based/information-theoretical as the second most voted, behind only

by the traditional Copenhagen interpretation.

The aim of this chapter is to provide a unified way to describe all possible quan-

tum dynamics; from time evolution to measurement. In Sec. 3.1, we will present

the dynamics of closed systems. We will study the pure and mixed states and then

as an example the dynamics of a qubit. In Sec. 3.2, we will set up the mathe-

matical language to describe systems of many particles (subsystems), introducing

fundamental concepts such as entanglement and partial trace. In Sec. 3.3, we will

study, in a fairly general way, the dynamics of open quantum systems. We will study

the properties of general dynamical maps and derive a differential equation for the

density operator [10,14]. In Sec. 3.4, we will study the generalized description of

measurement dynamics. Following this, in Sec. 3.5, we will conclude our purpose

showing an unified picture of quantum dynamics in terms of quantum operations.

As an application of the formalism developed in the previous sections, we will ad-

dress the phenomenon of decoherence in Sec. 3.6 and see that it is intimately related

with another phenomenon called environment-induced superselection. Most of this

chapter was based on Ref. [10].

3.1 Dynamics of Closed Quantum Systems

In order to discuss the dynamics of closed quantum systems we have to know what

a closed system is. For the moment we consider that the universe is everything that


Chapter 3. Quantum Dynamics 33

exists ∗ and we can divide it into two distinct parts, the system and the environment.

The system is that part of the universe in which we are interested to describe. For

example, a collection of atoms or molecules. The environment is all that is in the

universe and which doesn’t fit into the system. Being established these concepts, a

closed system is a system that, for all practical purposes, we can neglect all spoiler

effects of the environment over the system. So for example, we are interested in

studying the kinematics of a marble rolling freely † in a flat table, say, to verify

Newton’s laws. In this example the system would be the marble. The environment,

consisted of the table and everything else, does affect the system dynamics, namely,

the table exerts a normal force which balances the weight of the marble. But this

effect is necessary to maintain the marble’s movement constrained to the plane,

which is the region where we are interest to study the movement. Therefore this

influence does not spoil the experiment. On the other hand, the friction with the

local atmosphere or with the table could spoil our attempt to verify Newton’s law, as

they would affect the movement in the plane. If these effects can be made negligible

by an appropriate experimental setup then the marble is considered a closed system

and we will verify Newton’s laws. Let us begin to study the quantum dynamics of

such a systems.

3.1.1 Dynamics of Pure and Mixed States

When we begin to learn the mathematical formulation of quantum mechanics

we put the Schrödinger equation as a postulate. The Schrödinger equation is a

differential equation which gives the dynamics of a state vector |ψ0〉. The equation

is ih̄
d
dt
|ψ (t)〉 = H (t) |ψ (t)〉 ,

|ψ (0)〉 = |ψ0〉 .

(3.1)

For the sake of completeness let us establish a common language and notation at

this point. The symbol i stands for the imaginary unit and h̄ for the Planck’s

constant. The state vector |ψ (t)〉 stands for the mathematical representation, within

the theory, of a pure state. A state vector is a vector in some Hilbert space H,

∗We are not concerned with philosophical problems which may be raised with these concepts

but just setting up a common conceptual ground to understand the concept of closed systems.
†Freely means that we could set up an experiment such that the dynamical friction coefficient

between the marble and the table is negligible.


Chapter 3. Quantum Dynamics 34

also hereafter referred to as state space‡. The object H (t) is the Hamiltonian of

the system. It provides the structure of the dynamical evolution of a quantum

system. The Hamiltonian is an operator over the state space H. It lives in another

Hilbert space denoted by L (H), the space of linear operators over H. All state

vectors must be normalized in order to preserve the probabilistic characteristics

of quantum mechanics. The normalization condition is mathematically stated as

〈ψ, ψ〉 = 〈ψ |ψ〉 = 1, where 〈· | ·〉 is the inner product of the Hilbert space H. As we

shall see later on the Schrödinger equation preserves the normalization condition as

the time goes by.

From the Schrödinger equation we can deduce an algebraic equation for the

time evolution of the state vector. To deduce this equation we first rewrite the

Schrödinger equation in its integral form,

|ψ (t)〉 = |ψ0〉+
1

ih̄

∫ t

0

dt′H (t′) |ψ (t′)〉 . (3.2)

Now we substitute this equation on itself, leading us to

|ψ (t)〉 = |ψ0〉+
1

ih̄

∫ t

0

dt1H (t1) |ψ0〉

+

(
1

ih̄

)2 ∫ t

0

dt1H (t1)

∫ t1

0

dt2H (t2) |ψ (t2)〉 . (3.3)

Observe the time ordering t ≥ t1 ≥ t2 ≥ 0 in the second integral. The integrals∫ t
0
dt1
∫ t1

0
dt2 can be written as 1

2

∫ t
0
dt1
∫ t

0
dt2. With this relation the above equation

can be rewritten as

|ψ (t)〉 =

[
1 +

(
− i
h̄

)∫ t

0

dt1H (t1) +

(
− i
h̄

)2
1

2!

∫ t

0

dt1

∫ t

0

dt2H (t1)H (t2)

]
|ψ0〉

+O

(
1

h̄3

)
. (3.4)

Performing some more steps one can convince itself that we can write Eq. (3.2)

algebraically as

|ψ (t)〉 = U (t) |ψ0〉 , (3.5)

where U (t) is the time evolution operator

U (t) = T←

{
exp

(
− i
h̄

∫ t

0

dt′H (t′)

)}
. (3.6)

‡We will tacitly assume throughout this work that the state space is finite dimensional.


Chapter 3. Quantum Dynamics 35

The exponential function of an operator must be interpreted, by definition, as its

Taylor’s series. The symbol T← {. . . } denotes the antichronological-ordering, i.e., the

operators inside it must be ordered in the following way: T← {H (t1)H (t2) · · ·H (tn)}

= H (t1)H (t2) · · ·H (tn) for t1 ≥ t2 ≥ · · · ≥ tn.

In general, the Hamiltonian of the system does not depend explicitly on time.

When this is the case the expression for the time evolution operator reduces to the

usual form

U (t) = exp

(
−itH

h̄

)
. (3.7)

One can see easily that the time evolution operator is unitary, i.e., U (t)U † (t) =

U † (t)U (t) = I, where I is the identity operator in L (H). Physically the time evolu-

tion operator being unitary means that the norm of the state vector doesn’t change

with time. We can see this simply calculating the inner product of the evolved

state vector: 〈ψ (t) |ψ (t)〉 = 〈ψ0|U † (t)U (t) |ψ0〉 = 〈ψ0 |ψ0〉. As this algebraic ap-

proach for the time evolution of the state vector is completely equivalent to that

of the Schrödinger equation it follows that the Schrödinger equation also preserves

the normalization condition. We refer to this evolution to be a continuous unitary

evolution (or dynamics). This contrasts with the other possible evolution to which

a state vector may be subjected, the measurement. Within a dynamical point of

view, the measurement is a discontinuous evolution in which the state vector, at a

particular instant of time, suddenly changes to another. Moreover, the measurement

dynamics is non-unitary because once a state has changed there is no way to know

which original state it was, there is no inverse. This time evolution is referred in

the literature as discontinuous non-unitary evolution (dynamics) or measurement

dynamics of the state vector. How can it exist, in such a fundamental theory like

quantum mechanics, two diametrically opposed dynamics? One that is continuous

and unitary and another which is discontinuous and non-unitary§? For some re-

searchers this may be enough to claim that there must exist some other quantum

theory, or some underlying theory.

The usual measurement studied in quantum mechanics is the projective measure-

ment or von Neumann measurement. It is mathematically characterized by a set of

mutually orthogonal projection operators. Given an observable this set of projection

§The non-unitarity is not the problem, in fact. As we shall see, open systems undergo non-

unitary evolutions.


Chapter 3. Quantum Dynamics 36

operators is constructed with its eigenvectors¶. Consider some observable A and its

set of orthogonal eigenvectors {|ak〉}. From them we construct the orthogonal set

of projection operators {Pk = |ak〉 〈ak|}. They satisfy the properties of orthogona-

lity and idempotency through PkPl = Pkδkl and completeness,
∑

k Pk = I. The

probability of obtaining an outcome ak is given by the Born’s rule

p (ak) = 〈Pk〉|ψ〉 = 〈ψ| (|ak〉 〈ak|) |ψ〉 = |〈ak |ψ〉|2 . (3.8)

After the measurement the system is found to be in the state

˜|ψ′〉 = Pk |ψ〉 , (3.9)

which is a non normalized state‖. Normalizing it we obtain

|ψ′〉 =
1√
p (ak)

Pk |ψ〉 . (3.10)

We see that this state vector is normalized after a simple calculation,

〈ψ′ |ψ′〉 =
1

p (ak)
〈ψ|P 2

k |ψ〉 =
1

p (ak)
〈ψ|Pk |ψ〉

=
1

p (ak)
〈Pk〉|ψ〉 =

1

p (ak)
p (ak) = 1. (3.11)

So far we have studied the two standard dynamics, the unitary and a particular

measurement dynamics of the state vector. Let us turn now to another possible

representation of a pure state. This representation is going to be of fundamental

importance for what follows. It will enable us to treat the dynamics of open systems.

Consider a state vector |ψ0〉 ∈ H associated with a pure state. From it we can

build a projection operator called the (pure) density operator ρ = |ψ0〉 〈ψ0|. Using

Eq. (3.5) it is straightforward to find the time evolution of the density operator to

be

ρ (t) = U (t) ρ (0)U † (t) . (3.12)

Suppose {|ak〉} is an arbitrary orthonormal basis for H. Then the normalization

condition is translated to

〈ψ0 |ψ0〉 = 〈ψ0|

(∑
k

|ak〉 〈ak|

)
|ψ0〉 =

∑
k

〈ak |ψ0〉 〈ψ0 | ak〉 = Tr (ρ) = 1. (3.13)

¶An observable is a self-adjoint operator over H. The possible outcomes one may find measuring

this observable are its eigenvalues.
‖The tilde symbol ˜ will always stand for a non normalized state.


Chapter 3. Quantum Dynamics 37

As the density operator is a projection operator it is obvious that

Tr
(
ρ2
)

= 1. (3.14)

Let us find what is the equation that gives the dynamics of the density operator.

From Eq. (3.12) take the time derivative to obtain

d

dt
ρ (t) =

(
d

dt
U (t)

)
ρ (0)U † (t) + U (t) ρ (0)

(
d

dt
U † (t)

)
, (3.15)

which can be further evaluated to, by means of Eq. (3.7),

d

dt
ρ (t) =

(
− i
h̄
HU (t)

)
ρ (0)U † (t) + U (t) ρ (0)

(
U † (t)

i

h̄
H

)
=− i

h̄
[H, ρ (t)] . (3.16)

This equation is called the Liouville-von Neumann equation. Plus some initial con-

dition ρ (0) = ρ0 it gives the time evolution of the density operator, ρ (t). Up to this

point, these two different ways of describing a pure state are completely equivalent,

as it should be. In this way, a equation with this structure provides a unitary evolu-

tion. But this latter representation, namely in terms of the density operators, gives

us the opportunity to extend our description of physical systems. Let us pause for

a moment and briefly discuss an operational view of quantum mechanics [15].

A quantum system in a given quantum state may be always thought to be the

result of a preparation procedure. In other words, any quantum state is, at least

in principle, the result of some preparation procedure. A preparation procedure

is a sequence of steps performed in the laboratory such that at the end of the

process a quantum system emerges and the experiment indeed begins. In this way,

we say that the quantum system was prepared in the state, say, ρ. Consider the

following realistic situation. Our experimental device prepares a pure state with a

reliability of 90%, i.e., in 10% of the preparations it prepares some other pure state.

The question is: what is the appropriate object in the theory that describes this

particular situation? This preparation can be thought to be as composed by the

weighted distribution of the preparations of the two pure states. Each one of these

pure states are described by the density operators ρ1 and ρ2, respectively. Therefore

the appropriate weighted distribution is given by

ρ = p1ρ1 + p2ρ2, (3.17)


Chapter 3. Quantum Dynamics 38

where p1 = 0.9 and p2 = 0.1. This means exactly that in 90% of the cases we have

one pure state, ρ1, and in 10% the other, ρ2, and we are taking this into account

in ρ when describing the statistical predictions of this peculiar preparation. This

object still is an operator over H and furthermore its trace is one,

Tr (ρ) = p1Tr (ρ1) + p2Tr (ρ2) = p1 + p2 = 1, (3.18)

where we used the fact that both ρ1 and ρ2 are pure density operators. Thus, this

operator guarantees the normalization condition and it describes a possible physical

preparation. In other words, this operator is also a density operator and it is called

a mixed density operator. We say that this is a mixed density operator because this

preparation involves a statistical mixture between two pure states with probability

distribution {pk}. Notwithstanding this argumentation, in 1957 Gleason mathema-

tically showed that for any probability measure µ on a state space of dimension at

least 3, i.e., any way to define probabilities mathematically, implies the existence of

a trace class-one positive semidefinite operator∗∗ ρ (our density operator) such that

µ (A) = Tr [ρPA] . (3.19)

A is a closed subset of the state space and PA is its orthogonal projection. This is

exactly the Born’s rule for the density operators, as we shall see in a moment [16].

This means that the density operator is the most general object compatible with

a probability structure over Hilbert spaces. Observe that in this way we eliminate

altogether the description involving vectors inH because both observables and states

are described by operators overH. All dynamics and predictions are mathematically

stated in the space of operators over H. Even though this will not be important for

this work, it is, nevertheless, noteworthy that the observables and density operators

do not live in the same Hilbert space. This is so because the definition of a Hilbert

space involves an inner product and the space of density operators and observables

possess different inner products.

Considering what we have seen, the most general density operator is given by

ρ =
∑
k

pkρk, (3.20)

∗∗A positive semidefinite operator is an operator that for any vector of the Hilbert space its

average value is non-negative, 〈ρ〉|ψ〉 ≥ 0 ∀ |ψ〉 ∈ H. In other words all its eigenvalues are non-

negative and therefore positivity implies self-adjointness.


Chapter 3. Quantum Dynamics 39

where pk is the probability of the system to be in the pure state associated with ρk
††

and moreover
∑

k pk = 1 to maintain the normalization condition. In other words,

quantum mechanics says that any preparation procedure leads to a state described

by a density operator of the above form. The natural question which may be asked

here is: is there a mathematical criterion to distinguish a pure from a mixed density

operator? The answer is yes. As the density operator is self-adjoint we can always

diagonalize it and rewrite it as ρ =
∑

k p
′
k |k〉 〈k|, where p′k is a new probability

distribution. Take the trace of the squared density operator to give us

Tr
(
ρ2
)

=Tr

(∑
k,l

p′kp
′
l |k〉 〈k | l〉 〈l|

)
=
∑
k,l

pkplδklTr (|k〉 〈l|)

=
∑
k

p2
kTr (|k〉 〈k|) =

∑
k

p2
k ≤ 1. (3.21)

We used the fact that all the probabilities are smaller than one. If the density oper-

ator is associated with a pure state, the inequality is indeed an equality, Tr (ρ2) = 1,

as we have already concluded previously (cf. Eq. (3.14)). If this is not the case, then

necessarily the strict inequality holds, i.e., for mixed density operators Tr (ρ2) < 1.

In fact this quantity has a name. It is the purity of the density operator. Just to

formalize, the purity is defined as

ξ = Tr
(
ρ2
)

(3.22)

and from this we can assert thatξ = 1 , for pure density operators and

ξ < 1 , for mixed density operators.

(3.23)

We shall see why this quantity is important when we study how to measure the

entanglement between two pure states. The purity measures the degree of mixedness

of a quantum state. Consider the density operator of a two-level system given by

ρ1 = 1
2
|0〉 〈0| + 1

2
|1〉 〈1|. We are equally unacquainted with which state we are

dealing, because they have exactly the same probability to be prepared. On the

other hand, if the mixed state is given by ρ2 = 1
4
|0〉 〈0|+ 3

4
|1〉 〈1|, then we are more

confident that the state which we may be dealing with is, in this case, the state |1〉.

In this way we say that the density operator ρ1 is more mixed than ρ2. Furthermore

††The density operators ρk may also be mixed density operators, even though we can always

write them in terms of pure density operators.


Chapter 3. Quantum Dynamics 40

ρ1 is a maximally mixed density operator because its distribution is uniform and this

gives complete ignorance about which pure state we are dealing with. The purity of

these states are

ξ1 =

(
1

2

)2

+

(
1

2

)2

=
1

2
(3.24)

and

ξ2 =

(
1

4

)2

+

(
3

4

)2

=
5

8
, (3.25)

respectively. The state ρ2 is less mixed because its purity is close to the purity of

pure states, ξ1 ≤ ξ2 ≤ 1. For an N -dimensional Hilbert space the maximally mixed

state is given by the equiprobable distribution of a basis of pure states and therefore

the purity is bounded as 1/N ≤ ξ ≤ 1.

The last question to be addressed in this section is related to the mathematical

formulation to study the time evolution of these newly introduced mixed density

operators. Answering in advance, it will be the same used previously, but let us

check this out.

We want to find the proper differential equation which describes the time evolu-

tion of the most general density operator (3.20). Let ρk (0) be the initial (pure) den-

sity operator of each of the states which compose this mixed preparation. Therefore,

the initial mixed density operator is ρ (0) =
∑

k pkρk (0). We know from Eq. (3.12)

how each of the ρk’s evolve. Thus

ρ (t) =
∑
k

pk
(
U (t) ρk (0)U † (t)

)
=U (t)

(∑
k

pkρk (0)

)
U † (t) = U (t) ρ (0)U † (t) . (3.26)

As this expression is formally the same as Eq. (3.12), it follows that the derivation

for its time evolution is the same. So, the time evolution of any density operator is

given by

d

dt
ρ (t) = − i

h̄
[H, ρ (t)] . (3.27)

Therefore, this equation describes the most general dynamics of a closed system.

The measurement dynamics is expressed in this approach as follows. The pro-


Chapter 3. Quantum Dynamics 41

bability to find an outcome ak of an observable A is given by the Born’s rule

p (ak) =Tr [ρPk] =
∑
k

pkTr [ρk |ak〉 〈ak|] =
∑
k

pk 〈ak |ψk〉 〈ψk | ak〉

=
∑
k

pk |〈ak |ψk〉|2 =
∑
k

pkpk (ak) , (3.28)

which reflects that the probability is the weighted sum of the probabilities to measure

the outcome ak with respect to the pure states ρk. The normalized post-measurement

state is given by

ρ′ =
1

p (ak)
PkρPk =

1

p (ak)
|ak〉 〈ak| ρ |ak〉 〈ak|

=
1

p (ak)
p (ak) |ak〉 〈ak| = |ak〉 〈ak| . (3.29)

In this section we described the dynamics of pure and mixed states, their unitary

and measurement dynamics, in terms of projective measurements. We will see other

measurements in Sec. (3.4). In the next section we will study the dynamics of the

system that is the building block of quantum information, the qubit.

3.1.2 Dynamics of One Qubit

In this section we will apply the concepts developed in the previous section to

a simple quantum system. The simplest non-trivial quantum mechanical system is

a system composed of just one isolated spin-1/2 particle. Its state space is a two-

dimensional Hilbert space H2. A single spin-1/2 is just one physical system which

has such a state space. A system which can be described by the Hilbert space H2

is called a qubit. In the next chapter, when we address the quantum information

theory, we shall see why this system is important. Briefly stated, it plays the role

somehow analogous to the binary alphabet in classical information theory. There

may be several physical systems which can be said to be a qubit. Some examples

are:

• Atoms, ions, molecules which under specific conditions of the experimental

setup can be described effectively by a two-dimensional Hilbert space. One

standard example is that of atoms interacting resonantly with the radiation

field (Jaynes-Cummings model [17]);

• Spin-1/2 particles;


Chapter 3. Quantum Dynamics 42

• Polarization states of single photons;

• Quantum dots.

With that being said, let us turn to the theoretical description of a qubit. The most

general qubit state is

|ψ〉 = c0 |0〉+ c1 |1〉 , (3.30)

where {|0〉 , |1〉} are usually taken to be the eigenstates of the Pauli matrix σ3,

corresponding to the eigenvalues {+1,−1}, respectively. This basis is referred to

as the computational basis or the standard basis. Moreover, the coefficients must

satisfy the normalization condition |c0|2 + |c1|2 = 1. We saw in the previous section

that a more general description of quantum systems is in terms of density operators.

Therefore let us study the operators over the state space of a qubit.

An operator which acts on H2 is represented by a two by two matrix. Therefore,

the space of operator over H2, represented by L (H2), is the space of all two by two

matrices. A basis in this space is given by {I, σ1, σ2, σ3}, where I is the identity

matrix and the σk are the three Pauli matrices. This space of operators can be a

Hilbert space itself if endowed with the Hilbert-Schmidt inner product

〈A,B〉 = Tr
(
A†B

)
, ∀A,B ∈ L (H2) . (3.31)

Regarding this inner product the previous basis can become an orthonormal basis if

taken to be
{

1√
2
I, 1√

2
σk

}
. A general operator A ∈ L (H2) can, then, be written as

A =

〈
1√
2
I, A

〉
1√
2
I +

3∑
k=1

〈
1√
2
σk, A

〉
1√
2
σk

=
1

2
Tr [A] I +

1

2

3∑
k=1

Tr [Aσk]σk. (3.32)

The density operators representing one qubit are nothing but operators in L (H2)

with unit trace. Using the operator expansion above the most general density oper-

ator is given by

ρ =
1

2

(
I +

3∑
k=1

Tr [ρσk]σk

)
:=

1

2
(I + ~r · ~σ) , (3.33)

where

~r := Tr [ρ~σ] (3.34)


Chapter 3. Quantum Dynamics 43

is a three dimensional real vector. With Eq. (3.33), and the property of Pauli

matrices (~σ · ~a)
(
~σ ·~b

)
=
(
~a ·~b

)
I + i~σ ·

(
~a×~b

)
, we obtain

Tr
[
ρ2
]

=
1

4
Tr [I + 2~r · ~σ + (~r · ~σ) (~r · ~σ)] =

1

4
Tr [I + 2~r · ~σ + (~r · ~r) I + i~σ · (~r × ~r)]

=
1

4

(
1 + |r|2

)
Tr [I] + 2~σ · Tr [~σ] =

1

2

(
1 + |~r|2

)
. (3.35)

Consider, for example, a pure density operator ρ = |ψ〉 〈ψ|. As we saw in the

previous section, this implies that Tr [ρ2] = 1 and, therefore, the above equation

implies |~r| = 1. Now let us pause for a moment to parametrize the state vector of

Eq. (3.30) in a more convenient way.

The constants c0 and c1 are two complex numbers, which means that the state

vector would need four real numbers to be specified uniquely. If we write both

complex numbers in their polar representation, c0 = |c0| eiϕ1 and c1 = |c1| eiϕ2 , the

state vector may be written as

|ψ〉 = |c0| eiϕ0 |0〉+ |c1| eiϕ1 |1〉 = eiϕ0
(
|c0| |0〉+ |c1| eiϕ |1〉

)
, (3.36)

where ϕ = ϕ1 − ϕ0, and consider that ϕ ∈ [0, 2π) as the complex exponential is

2π-periodic. As the global phase can be completely forgotten because physically

we cannot measure it, we are left with three real parameters. The normalization

condition suggests to parametrize the two real numbers |c0| and |c1| as cos (θ/2) and

sin (θ/2), respectively, with θ ∈ [0, π]. Thus

|ψ〉 = cos

(
θ

2

)
|0〉+ sin

(
θ

2

)
eiϕ |1〉 . (3.37)

In conclusion, we need two real numbers θ and ϕ taking values in the intervals

above to represent an arbitrary pure qubit state. This parametrization is the same

parametrization of a point in a sphere centered at the origin of a three-dimensional

real space.

We have seen that an arbitrary qubit state is parametrized as a point in a

three-dimensional sphere and, at the same time, represented by a vector in a three-

dimensional space with constant norm, i.e., a sphere. Therefore, all pure density

operators lie in a sphere of radius one, called the Bloch sphere (see Fig. 3.1).

The vector ~r is called the Bloch vector. If ρ were a mixed density operator, then

Tr [ρ2] < 1 and consequently |r| < 1. Therefore, a mixed density operator lies inside


Chapter 3. Quantum Dynamics 44

the Bloch sphere. Thus, the Bloch sphere is a convex set‡‡ which represents all pure

and mixed states of a qubit, the pure states are represented by the boundary of the

sphere and the mixed states are represented by all the interior of the sphere.

Figure 3.1: The Bloch sphere with the vectors associated to the eigenvalues of the

Pauli operators. Also two arbitrary orthonormal qubit states |ψ〉 and |ξ〉, showing

that they lie in opposite points on the sphere’s surface. This figure was copied from

Ref. [10]

Let us visualize the measurement and unitary dynamics of a qubit in a pure state

in the Bloch sphere. For an arbitrary pure state, ~r = Tr [ρ~σ] can be written as

~r = Tr [|ψ〉 〈ψ|~σ] = 〈ψ|~σ |ψ〉 . (3.38)

Therefore ~r ·~r = 1 = 〈ψ|~r ·~σ |ψ〉 implies that ~r ·~σ |ψ〉 = |ψ〉. So, |ψ〉 is an eigenvector

with eigenvalue +1. As ~r · ~σ is a self-adjoint matrix with zero trace, it must have

another eigenvector with eigenvalue −1, ~r ·~σ |χ〉 = − |χ〉. Consider the inner product

〈ψ |χ〉 = 〈ψ|~r · ~σ |χ〉 , (3.39)

‡‡Simply stated a convex set is a set such that every two elements in it can be connected by a

straight line which is also inside the set. All polygons are convex but a boomerang is not.


Chapter 3. Quantum Dynamics 45

using that 〈ψ| = 〈ψ|~r · ~σ. Considering that |χ〉 is the eigenvector of ~r · ~σ with

negative eigenvalue, one has

〈ψ |χ〉 = −〈ψ |χ〉 =⇒ 〈ψ |χ〉 = 0. (3.40)

Moreover,

〈χ|~r · ~σ |χ〉 = ~rψ · 〈χ|~σ |χ〉 = ~rψ · ~rχ = −1 =⇒ ~rψ = −~rχ. (3.41)

Therefore, any two orthogonal vectors on the Bloch sphere are necessarily diamet-

rically opposed (cf. Fig. (3.1)).

Any observable has two eigenvectors which are opposed in the Bloch sphere

as described above. We set the Cartesian coordinates such that the observable

eigenvectors are |0〉 and |1〉. An arbitrary qubit state can be expanded in this basis

as

|ψ〉 = c0 |0〉+ c1 |1〉 . (3.42)

The rz projection of the Bloch vector is

rz = cos (θ) = cos2

(
θ

2

)
− sin2

(
θ

2

)
= |c0|2 − |c1|2 . (3.43)

Since |c0|2 + |c1|2 = 1, it follows that

|c0|2 =
1 + rz

2
, (3.44)

and

|c1|2 =
1− rz

2
. (3.45)

Therefore, the probability to find the outcome associated with one of the eigenvectors

of the observable in question is proportional to the projection of the Bloch vector

associated with the qubit onto the axis defined by the two eigenvectors.

To visualize the unitary dynamics in the Bloch sphere, we will need a parametriza-

tion for an arbitrary unitary matrix U ∈ SU (2). It can be shown that a parametriza-

tion of U is given by

U = eiκ

e−iλ/2 cos (µ/2) e−iν/2 −e−iλ/2 sin (µ/2) eiν/2

eiλ/2 sin (µ/2) e−iν/2 eiλ/2 cos (µ/2) eiν/2

 (3.46)

or, as an operator product,

U = eiκe−
i
2
λσ3e−

i
2
µσ2e−

i
2
νσ3 . (3.47)


Chapter 3. Quantum Dynamics 46

We can define a new unitary operator Û := e−iκU without the global phase. De-

composing Û in terms of the basis
{

1√
2
I, 1√

2
σk

}
according to Eq. (3.32) we find

Û = v0I − i~v · ~σ, (3.48)

where

v0 =
1

2
Tr
[
Û
]

and ~v =
i

2
Tr
[
Û~σ
]
. (3.49)

Observing Eq. (3.46), we can read off the following properties

Û00 = Û∗11, (3.50)

Û10 = −Û∗01, (3.51)

Û00Û11 − Û01Û10 = 1. (3.52)

With these properties one can show that v0, v1, v2, v3 ∈ R. The unitarity condition

implies that

v2
0 + v2 = 1. (3.53)

We can define

v0 := cos

(
φ

2

)
, ~v := sin

(
φ

2

)
~e, (3.54)

with ~e being a unit vector in R3. For a given Û , the quantities φ and ~e are determined

by Eq. (3.48). With this parametrization we can write

Û = e−i
φ
2
~e·~σ. (3.55)

Therefore, every unitary transformation over a qubit can be written uniquely in this

form up to a global phase eiκ.

Finally, we want to explain the effect of Û on the Bloch vector. Without any loss

of generality we can choose ~e to be the direction of ~ez. Hence, Eq. (3.55) reduces to

Û = e−i
φ
2 |0〉 〈0|+ e+iφ

2 |1〉 〈1| . (3.56)

Using the parametrization of Eq. (3.37) we find that

Û |ψ〉 = cos

(
θ

2

)
|0〉+ sin

(
θ

2

)
eiϕ+φ |1〉 . (3.57)

Therefore, the effect of a unitary operator U on the Bloch vector of some state vector

|ψ〉 is to rotate it by an angle φ over the axis ~e.

In this section we have seen the measurement and unitary dynamics of a qubit.

In the next section we will begin to consider quantum systems composed of many

subsystems.


Chapter 3. Quantum Dynamics 47

3.2 Composite Quantum Systems

In this section we will set up the mathematical ground to study quantum sys-

tems composed of two or more subsystems. The idea of subsystems is well posed

in classical mechanics where we deal with distinguishable objects. For instance, the

solar system can be treated as being composed by nine subsystems, one star (the

Sun) and eight planets. As they are macroscopical objects they are easily distin-

guishable. But now imagine a fluid and a small ball within it. The ball is easily seen

to be a subsystem but the individual molecules whose compound the fluid are not

easily distinguishable (by the naked eye) and therefore not easily classifiable as sub-

systems. In classical physics there is the idea that in principle we could paint these

individual molecules with different colors and therefore be able to distinguish them

somewhat easily. As we go more deeply into matter we learn that the elementary

particles are not distinguishable∗. Then, how can we treat them as subsystems if we

cannot even distinguish them? From the point of view of quantum mechanics, each

particle is described by its own state space. Therefore, the state space of a system

of particles will be given by an appropriate composition of the Hilbert spaces of the

individual particles (also called factor spaces). A closely related issue is the notion

of localization. In classical mechanics it is completely natural to talk about subsys-

tems because each subsystem is well localized in space. When we measure one of

two subsystems which are separated far apart we know that this measurement does

not disturb the other subsystem (besides negligible effects such as the gravitational

force). But in quantum mechanics this is not the case because quantum correlations

come in. We will define two important concepts in this section, entanglement and

the partial trace map.

As we said in the beginning of this section, the Hilbert space which describes

the properties of a system of particles is going to be an appropriate composition of

the Hilbert spaces of each individual particles. Two possible constructions are the

direct sum and the tensor product of the Hilbert spaces. Suppose that we have two

two-level systems; for example, two spins. Each spin is described by a Hilbert space

which is two-dimensional. If we construct a Hilbert space which is the direct sum

∗At least two elementary particles relatively close to each other or which had interacted in the

past. Of course an electron created on Earth and another created in the Sun may be distinguished.


Chapter 3. Quantum Dynamics 48

of the two Hilbert spaces we will end with a space four-dimensional† (2 + 2). If we

construct a Hilbert space which is the tensor product of the two Hilbert spaces we will

end with a space which is also four-dimensional‡ (2× 2). The conclusion is that two

spins are not enough (at least in this superficial discussion) to discriminate between

the two possible constructions. So let us consider three spins, instead. In this case if

we take the direct sum we obtain a six-dimensional space (2 + 2 + 2) and if we take

the tensor product we obtain an eight-dimensional space (2×2×2). Now we have to

think what properties we expect the composed space to have. Physically, a system

of three spins is expected to have the following degrees of freedom: |up, up, up〉,

|up, up, down〉, |up, down, up〉, |down, up, up〉, |up, down, down〉, |down, up, down〉,

|down, down, up〉 and |down, down, down〉. Hence 8 degrees of freedom. Therefore

the appropriate way to construct the Hilbert space of three spins is taking the tensor

product of each of the Hilbert spaces of the individual systems. That being said, we

conclude that the right way to construct the Hilbert space of a system of particles is

to take the tensor product of the Hilbert spaces of each of th