# Computational Secrecy

**Additional reading:** Sections 2.3 and 2.4 in Boneh-Shoup book.
Chapter 3 up to and including Section 3.3 in Katz-Lindell book.

Recall our cast of characters- Alice and Bob want to communicate
securely over a channel that is monitored by the nosy Eve. In the last
lecture, we have seen the definition of *perfect secrecy* that
guarantees that Eve cannot learn *anything* about their communication
beyond what she already knew. However, this security came at a price.
For every bit of communication, Alice and Bob have to exchange in
advance a bit of a secret key. In fact, the proof of this result gives
rise to the following simple Python program that can break every
encryption scheme that uses, say, a \(128\) bit key, with a \(129\) bit
message:

```
# Gets ciphertext as input and two potential plaintexts
# Positive return value means first is more likely,
# negative means second is more likely,
# 0 means both have same likelihood.
#
# We assume we have access to the function Decrypt(key,ciphertext)
def Distinguish(ciphertext,plaintext1,plaintext2):
bias = 0
key = [0]*128 #128 0's
while(sum(key)<128):
p = Decrypt(key,ciphertext)
if p==plaintext1: bias++
if p==plaintext2: bias--
increment(key)
return bias
# increment key when thought of as a number sorted from least significant
# to most significant bit. Assume not all bits are 1.
def increment(key):
i = key.index(0);
for j in range(i-1): key[j]=0
key[i]=1
```

Now, generating, distributing, and protecting huge keys causes immense logistical problems, which is why almost all encryption schemes used in practice do in fact utilize short keys (e.g., \(128\) bits long) with messages that can be much longer (sometimes even terabytes or more of data).

So, why can’t we use the above Python program to break all encryptions
in the Internet and win infamy and fortune? We can in fact, but we’ll
have to wait a *really* long time, since the loop in `Distinguish`

will
run \(2^{128}\) times, which will take much more than the lifetime of the
universe to complete, even if we used all the computers on the planet.

However, the fact that *this* particular program is not a feasible
attack, does not mean there does not exist a different attack. But this
still suggests a tantalizing possibility: if we consider a relaxed
version of perfect secrecy that restricts Eve to performing computations
that can be done in this universe (e.g., less than \(2^{256}\) steps
should be safe not just for human but for all potential alien
civilizations) then can we bypass the impossibility result and allow the
key to be much shorter than the message?

This in fact does seem to be the case, but as we’ve seen, defining
security is a subtle task, and will take some care. As before, the way
we avoid (at least some of) the pitfalls of so many cryptosystems in
history is that we insist on very precisely *defining* what it means for
a scheme to be secure.

Let us defer the discussion how one defines a function being computable in “less than \(T\) operations” and just say that there is a way to formally do so. Given the perfect secrecy definition we saw last time, a natural attempt for defining Computational secrecy would be the following:

An encryption scheme \((E,D)\) has *\(t\) bits of computational secrecy* if
for every two distinct plaintexts \(\{m_0,m_1\} \subseteq {\{0,1\}}^\ell\)
and every strategy of Eve using at most \(2^t\) computational steps, if we
choose at random \(b\in{\{0,1\}}\) and a random key \(k\in{\{0,1\}}^n\),
then the probability that Eve guesses \(m_b\) after seeing \(E_k(m_b)\) is
at most \(1/2\).It is important to keep track of what is known and unknown to the
adversary Eve. The adversary knows the set \(\{ m_0,m_1 \}\) of
potential messages, and the ciphertext \(y=E_k(m_b)\). The only things
she doesn’t know are whether \(b=0\) or \(b=1\), and the value of the
secret key \(k\). In particular, because \(m_0\) and \(m_1\) are known to
Eve, it does not matter whether we define Eve’s goal in this
“security game” as outputting \(m_b\) or as outputting \(b\).

Reference:firstcompdef seems very natural, but is in fact *impossible*
to achieve if the key is shorter than the message.

Before reading further, you might want to stop and think if you can
*prove* that there is no, say, \(\sqrt{n}\) secure encryption scheme
satisfying Reference:firstcompdef with \(\ell = n+1\) and where the time
to compute the encryption is polynomial.

The reason Reference:firstcompdef can’t be achieved that if the message
is even one bit longer than the key, we can always have a very efficient
procedure that achieves success probability of about \(1/2 + 2^{-n-1}\) by
guessing the key. That is, we can replace the loop in the Python program
`Distinguish`

by choosing the key at random. Since we have some small
chance of guessing correctly, we will get a small advantage over half.

To fix this definition, we do not consider guessing with such a tiny advantage as a “true break” of the scheme, and hence this will be the actual definition we use.

An encryption scheme \((E,D)\) has *\(t\) bits of computational secrecyThis is a slight simplification of the typical notion of “\(t\) bits
of security”. In the more standard definition we’d say that a scheme
has \(t\) bits of security if for every \(t_1+t_2 \leq t\), an attacker
running in \(2^{t_1}\) time can’t get success probability advantage
more than \(2^{-t_2}\). However these two definitions only differ from
one another by at most a factor of two. This may be important for
practical applications (where the difference between \(64\) and \(32\)
bits of security could be crucial) but won’t matter for our
concerns.*
if for every two distinct plaintexts
\(\{m_0,m_1\} \subseteq {\{0,1\}}^\ell\) and every strategy of Eve using
at most \(2^t\) computational steps, if we choose at random
\(b\in{\{0,1\}}\) and a random key \(k\in{\{0,1\}}^n\), then the probability
that Eve guesses \(m_b\) after seeing \(E_k(m_b)\) is at most \(1/2+2^{-t}\).

Having learned our lesson, let’s try to see that this strategy does give us the kind of conditions we desired. In particular, let’s verify that this definition implies the analogous condition to perfect secrecy.

If \((E,D)\) is has \(t\) bits of Computational secrecy as per Reference:compsecconcdef then every subset \(M \subseteq {\{0,1\}}^\ell\) and every strategy of Eve using at most \(2^t-(100\ell+100)\) computational steps, if we choose at random \(m\in M\) and a random key \(k\in{\{0,1\}}^n\), then the probability that Eve guesses \(m\) after seeing \(E_k(m_b)\) is at most \(1/|M|+2^{-t+1}\).

Before proving this theorem note that it gives us a pretty strong guarantee. In the exercises we will strengthen it even further showing that no matter what prior information Eve had on the message before, she will never get any non-negligible new information on it. One way to phrase it is that if the sender used a \(256\)-bit secure encryption to encrypt a message, then your chances of getting to learn any additional information about it before the universe collapses are more or less the same as the chances that a fairy will materialize and whisper it in your ear.

Before reading the proof, try to again review the proof of Reference:twotomanythm, and see if you can generalize it yourself to the computational setting.

The proof is rather similar to the equivalence of guessing one of two messages vs. one of many messages for perfect secrecy (i.e., Reference:twotomanythm). However, in the computational context we need to be careful in keeping track of Eve’s running time. In the proof of Reference:twotomanythm we showed that if there exists:

- A subset \(M\subseteq {\{0,1\}}^\ell\) of messages

and

- An adversary \(Eve:{\{0,1\}}^o\rightarrow{\{0,1\}}^\ell\) such that

\[ \Pr_{m{\leftarrow_{\tiny R}}M, k{\leftarrow_{\tiny R}}{\{0,1\}}^n}[ Eve(E_k(m))=m ] > 1/|M| \]

Then there exist two messages \(m_0,m_1\) and an adversary \(Eve':{\{0,1\}}^0\rightarrow{\{0,1\}}^\ell\) such that \(\Pr_{b{\leftarrow_{\tiny R}}{\{0,1\}},k{\leftarrow_{\tiny R}}{\{0,1\}}^n}[Eve'(E_k(m_b))=m_b ] > 1/2\).

To adapt this proof to the computational setting and complete the proof of the current theorem it suffices to show that:

- If the probability of \(Eve\) succeeding was \(\tfrac{1}{|M|} + \epsilon\) then the probability of \(Eve'\) succeeding is at least \(\tfrac{1}{2} + \epsilon/2\).
- If \(Eve\) can be computed in \(T\) operations, then \(Eve'\) can be computed in \(T + 100\ell + 100\) operations.

This will imply that if \(Eve\) ran in polynomial time and had polynomial advantage over \(1/|M|\) in guessing a plaintext chosen from \(M\), then \(Eve'\) would run in polynomial time and have polynomial advantage over \(1/2\) in guessing a plaintext chosen from \(\{ m_0,m_1\}\).

The first item can be shown by simply doing the same proof more carefully, keeping track how the advantage over \(\tfrac{1}{|M|}\) for \(Eve\) translates into an advantage over \(\tfrac{1}{2}\) for \(Eve'\). As the world’s most annoying saying goes, doing this is an excellent exercise for the reader. The item point is obtained by looking at the definition of \(Eve'\) from that proof. On input \(c\), \(Eve'\) computed \(m=Eve(c)\) (which costs \(T\) operations), checked if \(m=m_0\) (which costs, say, at most \(5\ell\) operations), and then outputted either \(1\) or a random bit (which is a constant, say at most \(100\) operations).

### Proof by reduction

The proof of Reference:twotomanycomp is a model to how a great many of the results in this course will look like. Generally we will have many theorems of the form:

“If there is a scheme \(S'\) satisfying security definition \(X'\) then there is a scheme \(S\) satisfying security definition \(X\)”

In the context of Reference:twotomanycomp, \(X'\) was “having \(t\) bits of security” (in the context distinguishing between encryptions of two ciphertexts) and \(X\) was the more general notion of hardness of getting a non-trivial advantage over guessing for an encryption of a random \(m\in M\). While in Reference:twotomanycomp the encryption scheme \(S\) was the same as \(S'\), this need not always be the case. However, all of the proofs of such statements will have the same global structure— we will assume towards a contradiction, that there is an efficient adversary strategy \(Eve\) demonstrating that the scheme \(S\) violates the security notion \(X\), and build from \(Eve\) a strategy \(Eve'\) demonstrating that \(S'\) violates \(X\). This is such an important point that it deserves repeating:

The way you show that if \(S'\) is secure then \(S\) is secure is by giving a transformation from an adversary that breaks \(S\) into an adversary that breaks \(S'\)

For computational secrecy, we will always want that \(Eve'\) will be efficient if \(Eve\) is, and that will usually be the case because \(Eve'\) will simply use \(Eve\) as a black box, which it will not invoke too many times, and addition will use some polynomial time preprocessing and postprocessing. The more challenging parts of such proofs are typically:

- Coming up with the strategy \(Eve'\).
- Analyzing the probability of success and in particular showing that if \(Eve\) had non-negligible advantage then so will \(Eve'\).

Note that, just like in the context of NP completeness or
uncomputability reductions, security reductions work *backwards*. That
is, we construct the scheme \(S\) based on the scheme \(S'\), but then prove
that we can transform an algorithm breaking \(S\) into an algorithm
breaking \(S'\). Just like in computational complexity, it can sometimes
be hard to keep track of the direction of the reduction. In fact,
cryptographic reductions can be even subtler, since they involve an
interplay of several entities (for example, sender, receiver, and
adversary) and probabilistic choices (e.g., over the message to be sent
and the key).

## The asymptotic approach

For practical security, often every bit of security matters. We want our
keys to be as short as possible and our schemes to be as fast as
possible while satisfying a particular level of security. However, for
understanding the *principles* behind cryptography, keeping track of
those bits can be a distraction, and so just like we do for algorithms,
we will use *asymptotic analysis* (also known as *big Oh notation*) to
sweep many of those details under the carpet.

To a first approximation, there will be only two types of running times we will encounter in this course:

*Polynomial*running time of the form \(d\cdot n^c\) for some constants \(d,c>0\) (or \(poly(n)=n^{O(1)}\) for short) , which we will consider as*efficient**Exponential*running time of the form \(2^{d\cdot n^{\epsilon}}\) for some constants \(d,\epsilon >0\) (or \(2^{n^{\Omega(1)}}\) for short) which we will consider as*infeasible*.Some texts reserve the term*exponential*to functions of the form \(2^{\epsilon n}\) for some \(\epsilon > 0\) and call a function such as, say, \(2^{\sqrt{n}}\)*subexponential*. However, we will generally not make this distinction in this course.

Another way to say it is that in this course, if a scheme has any security at all, it will have at least \(n^{\epsilon}\) bits of security where \(n\) is the length of the key and \(\epsilon>0\) is some absolute constant such as \(\epsilon=1/3\).

These are not all the theoretically possible running times. One can have
intermediate functions such as \(n^{\log n}\) though we will generally not
encounter those. To make things clean (and to correspond to standard
terminology), we will say that an algorithm \(A\) is *efficient* if it
runs in time \(poly(n)\) when \(n\) is its input length (which will always
be the same, up to polynomial factors, as the key length). If \(\mu(n)\)
is some probability that depends on the input/key length parameter \(n\),
then we say that \(\mu(n)\) is *negligible* if it’s smaller than every
polynomial. That is, for every \(c,d\) there is some \(N\), such that if
\(n>N\) then \(\mu(n) < 1/(cn)^d\). Note that for every non-constant
polynomials \(p,q\), \(\mu(n)\) is negligible if and only if the function
\(\mu'(n) = p(\mu(q(n)))\) is negligible.

The above definitions could be confusing if you haven’t encountered asymptotic analysis before. Reading the beginning of Chapter 3 (pages 43-51) in the KL book, as well as the mathematical background lecture in my intro to TCS notes can be extremely useful. As a rule of thumb, if every time you see the word “polynomial” you imagine the function \(n^{10}\) and every time you see the word “negligible” you imagine the function \(2^{-\sqrt{n}}\) then you will get the right intuition.

What you need to remember is that negligible is much smaller than any inverse polynomial, while polynomials are closed under multiplication, and so we have the “equations” \(negligible\times polynomial = negligible\) and \(polynomial \times polynomial = polynomial\). As mentioned, in practice people really want to get as close as possible to \(n\) bits of security with an \(n\) bit key, but we would be happy as long as the security grows with the key, so when we say a scheme is “secure” you can think of it having \(\sqrt{n}\) bits of security (though any function growing faster than \(\log n\) would be fine as well).

From now on, we will require all of our encryption schemes to be
*efficient* which means that the encryption and decryption algorithms
should run in polynomial time. Security will mean that any efficient
adversary can make at most a negligible gain in the probability of
guessing the message over its a priori probability.Note that there is a subtle issue here with the order of
quantifiers. For a scheme to be efficient, the algorithms such as
encryption and decryption need to run in some *fixed* polynomial
time such as \(n^2\) or \(n^3\). In contrast we allow the adversary to
run in *any* polynomial time. That is, for every \(c\), if \(n\) is
large enough, then the scheme should be secure against an adversary
that runs in time \(n^c\). This is a general principle in cryptography
that we always allow the adversary potentially much more resources
than those used by the honest users. In practical security we often
assume that the gap between the honest use and the adversary
resources can be *exponential*. For example, a low power embedded
device can encrypt messages that, as far as we know, are
undecipherable even by a nation-state using super-computers and
massive data centers. That is, we make
the following definition:

An encryption scheme \((E,D)\) is *computationally secret* if for every
two distinct plaintexts \(\{m_0,m_1\} \subseteq {\{0,1\}}^\ell\) and every
efficient (i.e., polynomial time) strategy of Eve, if we choose at
random \(b\in{\{0,1\}}\) and a random key \(k\in{\{0,1\}}^n\), then the
probability that Eve guesses \(m_b\) after seeing \(E_k(m_b)\) is at most
\(1/2+\mu(n)\) for some negligible function \(\mu(\cdot)\).

### Counting number of operations.

One more detail that we’ve so far ignored is what does it mean exactly
for a function to be computable using at most \(T\) operations.
Fortunately, when we don’t really care about the difference between \(T\)
and, say, \(T^2\), then essentially every reasonable definition gives the
same answer. Formally, we can use the notions of Turing machines,
Boolean circuits, or straightline programs to define complexity. For
concreteness, lets define that a function
\(F:{\{0,1\}}^n\rightarrow{\{0,1\}}^m\) has complexity at most \(T\) if
there is a Boolean circuit that computes \(F\) using at most \(T\) NAND
gates (or equivalently, there is a NAND program computing \(F\) in at most
\(T\) lines). (There is nothing special about NAND, and we can use any
other universal gate set.) We will often also consider *probabilistic*
functions in which case we allow the circuit a RAND gate that outputs a
single random bit (though this in general does not give extra power).
The fact that we only care about asymptotics means you don’t really need
to think of gates, etc.. when arguing in cryptography. However, it is
comforting to know that this notion has a precise mathematical
formulation.

## Our first conjecture

We are now ready to make our first conjecture:

The Cipher Conjecture:As will be the case for other conjectures we talk about, the name “The Cipher Conjecture” is not a standard name, but rather one we’ll use in this course. In the literature this conjecture is mostly referred to as the conjecture of existence ofone way functions, a notion we will learn about later. These two conjectures a priori seem quite different but have been shown to be equivalent. There exists a computationally secret encryption scheme \((E,D)\) (where \(E,D\) are efficient) with a key of size \(n\) for messages of size \(n+1\).

A *conjecture* is a well defined mathematical statement which (1) we
believe is true but (2) don’t know yet how to prove. Proving the cipher
conjecture will be a great achievement and would in particular settle
the P vs NP question, which is arguably *the* fundamental question of
computer science. That is, the following theorem is known:

If \(P=NP\) then there does not exist a computationally secret encryption with efficient \(E\) and \(D\) and where the message is longer than the key.

We just sketch the proof, as this is not the focus of this course. If
\(P=NP\) then whenever we have a loop that searches through some domain to
find some string that satisfies a particular property (like the loop in
the `Distinguish`

subroutine above that searches over all keys) then
this loop can be sped up *exponentially* .

While it is very widely believed that \(P\neq NP\), at the moment we do
not know how to *prove* this, and so have to settle for accepting the
cipher conjecture as essentially an axiom, though we will see later in
this course that we can show it follows from some seemingly weaker
conjectures.

There are several reasons to believe the cipher conjecture. We now briefly mention some of them:

*Intuition:*If the cipher conjecture is false then it means that for*every*possible cipher we can make the exponential time attack described above become efficient. It seems “too good to be true” in a similar way that the assumption that P=NP seems too good to be true.*Concrete candidates:*As we will see in the next lecture, there are several concrete candidate ciphers using keys shorter than messages for which despite*tons*of effort, no one knows how to break them. Some of them are widely used and hence governments and other benign or not so benign organizations have every reason to invest huge resources in trying to break them. Despite that as far as we know (and we know a little more after Edward Snowden’s revelations) there is no significant break known for the most popular ciphers. Moreover, there are other ciphers that can be based on canonical mathematical problems such as factoring large integers or decoding random linear codes that are immensely interesting in their own right, independently of their cryptographic applications.*Minimalism:*Clearly if the cipher conjecture is false then we also don’t have a secure encryption with a key, say, twice as long as the message. But it turns out the cipher conjecture is in fact*necessary*for essentially every cryptographic primitive, including not just private key and public key encryptions but also digital signatures, hash functions, pseudorandom generators, and more. That is, if the cipher conjecture is false then to a large extent crytpgoraphy does not exist, and so we essentially have to assume this conjecture if we want to do any kind of cryptography.

## Why care about the cipher conjecture?

“Give me a place to stand, and I shall move the world”Archimedes, circa 250 BC

Every perfectly secure encryption scheme is clearly also computationally
secret, and so if we required a message of size \(n\) instead \(n+1\), then
the conjecture would have been trivially satisfied by the one-time pad.
However, having a message longer than the key by just a single bit does
not seem that impressive. Sure, if we used such a scheme with \(128\)-bit
long keys, our communication will be smaller by a factor of \(128/129\)
(or a saving of about \(0.8\%\)) over the one-time pad, but this doesn’t
seem worth the risk of using an unproven conjecture. However, it turns
out that if we assume this rather weak condition, we can actually get a
computationally secret encryption scheme with a message of size \(p(n)\)
for *every* polynomial \(p(\cdot)\). In essence, we can fix a single
\(n\)-bit long key and communicate securely as many bits as we want!

Moreover, this is just the beginning. There is a huge range of other useful cryptographic tools that we can obtain from this seemingly innocent conjecture: (We will see what all these names and some of these reductions mean later in the course.)

We will soon see the first of the many reductions we’ll learn in this course. Together this “web of reductions” forms the scientific core of cryptography, connecting many of the core concepts and enabling us to construct increasingly sophisticated tools based on relatively simple “axioms” such as the cipher conjecture.

## Prelude: Computational Indistinguishability

The task of Eve in breaking an encryption scheme is to *distinguish*
between an encryption of \(m_0\) and an encryption of \(m_1\). It turns out
to be useful to consider this question of when two distributions are
*computationally indistinguishable* more broadly:

Let \(X\) and \(Y\) be two distributions over \({\{0,1\}}^o\). We say that \(X\)
and \(Y\) are \((T,\epsilon)\)*-computationally indistinguishable*, denoted
by \(X \approx_{T,\epsilon} Y\), if for every function \(Eve\) computable
with at most \(T\) operations,

\[ | \Pr[ Eve(X) = 1 ] - \Pr[ Eve(Y) = 1 ] | \leq \epsilon \;. \]

We say that \(X\) and \(Y\) are simply *computationally indistinguishable*,
denoted by \(X\approx Y\), if they are \((T,\epsilon)\) indistinguishable
for every polynomial \(T(o)\) and inverse polynomial \(\epsilon(o)\).This definition implicitly assumes that \(X\) and \(Y\) are actually
*parameterized* by some number \(n\) (that is polynomially related to
\(o\)) so for every polynomial \(T(o)\) and inverse polynomial
\(\epsilon(o)\) we can take \(n\) to be large enough so that \(X\) and \(Y\)
will be \((T,\epsilon)\) indistinguishable. In all the cases we will
consider, the choice of the parameter \(n\) (which is usually the
length of the key) will be clear from the context.

**Note:** The expression \(\Pr[ Eve(X)=1]\) can also be written as
\({\mathbb{E}}[Eve(X)]\) (since we can assume that whenever \(Eve(x)\) does
not output \(1\) it outputs zero). This notation will be useful for us
sometimes.

We can use computational indistinguishability to phrase the definition of Computational secrecy more succinctly:

Let \((E,D)\) be a valid encryption scheme. Then \((E,D)\) is computationally secret if and only if for every two messages \(m_0,m_1 \in \{0,1\}^\ell\), \[ \{ E_k(m_0) \} \approx \{ E_k(m_1) \} \] where each of these two distributions is obtained by sampling a random \(k{\leftarrow_{\tiny R}}{\{0,1\}}^n\).

Working out the proof is an excellent way to make sure you understand both the definition of Computational secrecy and computational indistinguishability, and hence we leave it as an exercise.

One intuition for computational indistinguishability is that it is
related to some notion of *distance*. If two distributions are
computationally indistinguishable, then we can think of them as “very
close” to one another, at least as far as efficient observers are
concerned. Intuitively, if \(X\) is close to \(Y\) and \(Y\) is close to \(Z\)
then \(X\) should be close to \(Z\).Results of this form are known as “triangle inequalities” since
they can be viewed as generalizations of the statement that for
every three points on the plane \(x,y,z\), the distance from \(x\) to
\(z\) is not larger than the distance from \(x\) to \(y\) plus the
distance from \(y\) to \(z\). In other words, the edge \(\overline{x,z}\)
of the triangle \((x,y,z)\) is not longer than the sum of the lengths
of the other two edges \(\overline{x,y}\) and \(\overline{y,z}\). Similarly if four distributions
\(X,X',Y,Y'\) satisfy that \(X\) is close to \(Y\) and \(X'\) is close to \(Y'\),
then you might expect that the distribution \((X,X')\) where we take two
independent samples from \(X\) and \(X'\) respectively, is close to the
distribution \((Y,Y')\) where we take two independent samples from \(Y\) and
\(Y'\) respectively. We will now verify that these intuitions are in fact
correct:

Suppose \(\{ X_1 \} \approx_{T,\epsilon} \{ X_2 \} \approx_{T,\epsilon} \cdots \approx_{T,\epsilon} \{ X_m \}\). Then \(\{ X_1 \} \approx_{T, (m-1)\epsilon} \{ X_m \}\).

Suppose that there exists a \(T\) time \(Eve\) such that \[ |\Pr[ Eve(X_1)=1] - \Pr[ Eve(X_m)=1]| > (m-1)\epsilon \;. \]

Write \[ \Pr[ Eve(X_1)=1] - \Pr[ Eve(X_m)=1] = \sum_{i=1}^{m-1} \left( \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right) \;. \]

Thus, \[ \sum_{i=1}^{m-1} \left| \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right| > (m-1)\epsilon \] and hence in particular there must exists some \(i\in\{1,\ldots,m-1\}\) such that \[ \left| \Pr[ Eve(X_i)=1] - \Pr[ Eve(X_{i+1})=1] \right| > \epsilon \] contradicting the assumption that \(\{ X_i \} \approx_{T,\epsilon} \{ X_{i+1} \}\) for all \(i\in\{1,\ldots,m-1\}\).

Suppose that \(X_1,\ldots,X_\ell,Y_1,\ldots,Y_\ell\) are distributions over \({\{0,1\}}^n\) such that \(X_i \approx_{T,\epsilon} Y_i\). Then \((X_1,\ldots,X_\ell) \approx_{T-10\ell n,\ell\epsilon} (Y_1,\ldots,Y_\ell)\).

For every \(i\in\{0,\ldots,\ell\}\) we define \(H_i\) to be the distribution \((X_1,\ldots,X_i,Y_{i+1},\ldots,Y_\ell)\). Clearly \(H_0 = (X_1,\ldots,X_\ell)\) and \(H_\ell = (Y_1,\ldots,Y_\ell)\). We will prove that for every \(i\), \(H_i \approx_{T-10\ell n,\epsilon} H_{i+1}\), and the proof will then follow from the triangle inequality (can you see why?). Indeed, suppose towards the sake of contradiction that there was some \(i\in \{0,\ldots,\ell\}\) and some \(T-10\ell n\)-time \(Eve's:{\{0,1\}}^{n\ell}\rightarrow{\{0,1\}}\) such that

\[ \left| {\mathbb{E}}[ Eve'(H_i) ] - {\mathbb{E}}[ Eve(H_{i+1}) ] \right| > \epsilon\;. \]

In other words \[ \left| {\mathbb{E}}_{X_1,\ldots,X_{i-1},Y_i,\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_{i-1},Y_i,\ldots,Y_\ell) ] - {\mathbb{E}}_{X_1,\ldots,X_i,Y_{i+1},\ldots,Y_\ell}[ Eve'(X_1,\ldots,X_i,Y_{i+1},\ldots,Y_\ell) ] \right| > \epsilon\;. \]

By linearity of expectation we can write the difference of these two expectations as \[ {\mathbb{E}}_{X_1,\ldots,X_{i-1},X_i,Y_i,Y_{i+1},\ldots,Y_\ell}\left[ Eve'(X_1,\ldots,X_{i-1},Y_i,Y_{i+1},\ldots,Y_\ell) - Eve'(X_1,\ldots,X_{i-1},X_i,Y_{i+1},\ldots,Y_\ell) \right] \]

By the *averaging principle*This is the principle that if the average grade in an exam was at
least \(\alpha\) then *someone* must have gotten at least \(\alpha\), or
in other words that if a real-valued random variable \(Z\) satisfies
\({\mathbb{E}}Z \geq \alpha\) then \(\Pr[Z\geq \alpha]>0\). this means that there exist some values
\(x_1,\ldots,x_{i-1},y_{i+1},\ldots,y_\ell\) such that \[
\left|{\mathbb{E}}_{X_i,Y_i}\left[ Eve'(x_1,\ldots,x_{i-1},Y_i,y_{i+1},\ldots,y_\ell) - Eve'(x_1,\ldots,x_{i-1},X_i,y_{i+1},\ldots,y_\ell) \right]\right|>\epsilon
\] Now \(X_i\) and \(Y_i\) are simply independent draws from the
distributions \(X\) and \(Y\) respectively, and so if we define
\(Eve(z) = Eve'(x_1,\ldots,x_{i-1},z,y_{i+1},\ldots,y_\ell)\) then \(Eve\)
runs in time at most the running time of \(Eve\) plus \(2\ell n\) and it
satisfies \[
\left| {\mathbb{E}}_{X_i} [ Eve(X_i) ] - {\mathbb{E}}_{Y_i} [ Eve(Y_i) ] \right| > \epsilon
\] contradicting the assumption that \(X_i \approx_{T,\epsilon} Y_i\).

The above proof illustrates a powerful technique known as the *hybrid
argument* whereby we show that two distribution \(C^0\) and \(C^1\) are
close to each other by coming up with a sequence of distributions
\(H_0,\ldots,H_t\) such that \(H_t = C^1, H_0 = C^0\), and we can argue that
\(H_i\) is close to \(H_{i+1}\) for all \(i\). This type of argument repeats
itself time and again in cryptography, and so it is important to get
comfortable with it.

## The Length Extension Theorem

We now turn to show the *length extension theorem*, stating that if we
have an encryption for \(n+1\)-length messages with \(n\)-length keys, then
we can obtain an encryption with \(p(n)\)-length messages for every
polynomial \(p(n)\). For a warm-up, let’s show that the easier fact that
we can transform an encryption such as above, into one that has keys of
length \(tn\) and messages of length \(t(n+1)\) for every integer \(t\):

Suppose that \((E',D')\) is a computationally secret encryption scheme with \(n\) bit keys and \(n+1\) bit messages. Then the scheme \((E,D)\) where \(E_{k_1,\ldots,k_t}(m_1,\ldots,m_t)= (E'_{k_1}(m_1),\ldots, E'_{k_T}(m_t))\) and \(D_{k_1,\ldots,k_t}(c_1,\ldots,c_t)= (D'_{k_1}(c_1),\ldots, D'_{k_t}(c_t))\) is a computationally secret scheme with \(tn\) bit keys and \(t(n+1)\) bit messages.

This might seem “obvious” but in cryptography, even obvious facts are sometimes wrong, so it’s important to prove this formally. Luckily, this is a fairly straightforward implication of the fact that computational indisinguishability is preserved under many samples. That is, by the security of \((E',D')\) we know that for every two messages \(m,m' \in {\{0,1\}}^{n+1}\), \(E_k(m) \approx E_k(m')\) where \(k\) is chosen from the distribution \(U_n\). Therefore by the indistinguishability of many samples lemma, for every two tuples \(m_1,\ldots,m_t \in {\{0,1\}}^{n+1}\) and \(m'_1,\ldots,m'_t\in {\{0,1\}}^{n+1}\),

\[ (E'_{k_1}(m_1),\ldots,E'_{k_t}(m_t)) \approx (E'_{k_1}(m'_1),\ldots,E'_{k_t}(m'_t)) \]

for random \(k_1,\ldots,k_t\) chosen independently from \(U_n\) which is exactly the condition that \((E,D)\) is computationally secret.

We can now prove the full length extension theorem. Before doing so, we
will need to generalize the notion of an encryption scheme to allow a
*randomized encryption scheme*. That is, we will consider encryption
schemes where the encryption algorithm can “toss coins” in its
computation. There is a crucial difference between key material and such
“as hoc” randomness. Keys need to be not only chosen at random, but also
shared in advance between the sender and receiver, and stored securely
throughout their lifetime. The “coin tosses” used by a randomized
encryption scheme are generated “on the fly” and are not known to the
receiver, nor do they need to be stored long term by the sender. So,
allowing such randomized encryption does not make a difference for most
applications of encryption schemes. In fact, as we will see later in
this course, randomized encryption is *necessary* for security against
more sophisticated attackes such as chosen plaintext and chosen
ciphertext attacks, as well as for obtaining secure *public key*
encryptions. We will use the notation \(E_k(m;r)\) to denote the output of
the encryption algorithm on key \(k\), message \(m\) and using internal
randomness \(r\). We often supress the notation for the randomness, and
hence use \(E_k(m)\) to denote the random variable obtained by sampling a
random \(r\) and outputting \(E_k(m;r)\).

We can now show that given an encryption scheme with messages one bit longer than the key, we can obtain a (randomized) encryption scheme with arbitrarily long messages:

Suppose that there exists a computationally secret encryption scheme \((E',D')\) with key length \(n\) and message length \(n+1\). Then for every polynomial \(t(n)\) there exists a (randomized) computationally secret encryption scheme \((E,D)\) with key length \(n\) and message length \(t(n)\).

Let \(t=t(n)\). We are given a cipher \(E'\) which can encrypt \(n+1\)-bit
long messages with an \(n\)-bit long key and we need to encrypt a \(t\)-bit
long message \(m=(m_1,\ldots,m_t) \in {\{0,1\}}^t\). Our idea is simple
(at least in hindsight). Let \(k_0 {\leftarrow_{\tiny R}}{\{0,1\}}^n\) be
our key (which is chosen at random). To encrypt \(m\) using \(k_0\), the
encryption function will choose \(t\) random strings
\(k_1,\ldots, k_t {\leftarrow_{\tiny R}}{\{0,1\}}^n\). We will then
encrypt the \(n+1\)-bit long message \((k_1,m_1)\) with the key \(k_0\) to
obtain the ciphertext \(c_1\), then encrypt the \(n+1\)-bit long message
\((k_2,m_2)\) with the key \(k_1\) to obtain the ciphertext \(c_2\), and so on
and so forth until we encrypt the message \((k_t,m_t)\) with the key
\(k_{t-1}\).The keys \(k_1,\ldots,k_t\) are sometimes known as *ephemeral keys*
in the crypto literature, since they are created only for the
purposes of this particular interaction. We output \((c_1,\ldots,c_t)\) as the final
ciphertext.The astute reader might note that the key \(k_t\) is actually not
used anywhere in the encryption nor decryption and hence we could
encrypt \(n\) more bits of the message instead in this final round. We
used the current description for the sake of symmetry and simplicity
of exposition.

To decrypt \((c_1,\ldots,c_t)\) using the key \(k_0\), first decrypt \(c_1\) to learn \((k_1,m_1)\), then use \(k_1\) to decrypt \(c_2\) to learn \((k_2,m_2)\), and so on until we use \(k_{t-1}\) to decrypt \(c_t\) and learn \((k_t,m_t)\). Finally we can simply output \((m_1,\ldots,m_t)\).

The above are clearly valid encryption and decryption algorithms, and
hence the real question becomes *is it secure??*. The intuition is that
\(c_1\) hides all information about \((k_1,m_1)\) and so in particular the
first bit of the message is encrypted securely, and \(k_1\) still can be
treated as an unknown random string even to an adversary that saw \(c_1\).
Thus, we can think of \(k_1\) as a random secret key for the encryption
\(c_2\), and hence the second bit of the message is encrypted securely,
and so on and so forth.

Our discussion above looks like a reasonable intuitive argument, but to make sure it’s true we need to give an actual proof. Let \(m,m' \in {\{0,1\}}^t\) be two messages. We need to show that \(E_{U_n}(m) \approx E_{U_n}(m')\). The heart of the proof will be the following claim:

**Claim:** Let \(\hat{E}\) be the algorithm that on input a message \(m\)
and key \(k_0\) works like \(E\) except that its the \(i^{th}\) block contains
\(E'_{k_{i-1}}(k'_i,m_i)\) where \(k'_i\) is a *random* string in
\({\{0,1\}}^n\), that is chosen *independently* of everything else
including the key \(k_i\). Then, for every message \(m\in{\{0,1\}}^t\)

\[ E_{U_n}(m) \approx \hat{E}_{U_n}(m) \label{lengthextendclaimeq} \;. \]

Note that \(\hat{E}\) is not a valid encryption scheme since it’s not at all clear there is a decryption algorithm for it. It is just an hypothetical tool we use for the proof. Since both \(E\) and \(\hat{E}\) are randomized encryption schemes (with \(E\) using \((t-1)n\) bits of randomness for the emphemeral keys \(k_1,\ldots,k_{t-1}\) and \(\hat{E}\) using \((2t-1)n\) bits of randomness for the ephemeral keys \(k_1,\ldots,k_t,k'_2,\ldots,k'_t\)), we can also write \eqref{lengthextendclaimeq} as \[ E_{U_n}(m; U'_{tn}) \approx \hat{E}_{U_n}(m; U'_{(2t-1)n}) \] where we use \(U'_\ell\) to denote a random variable that is chosen uniformly at random from \(\{0,1\}^\ell\) and independently from the choice of \(U_n\) (which is chosen uniformly at random from \(\{0,1\}^n\)).

Once we prove the claim then we are done since we know that for every pair of message \(m,m'\), \(E_{U_n}(m) \approx \hat{E}_{U_n}(m)\) and \(E_{U_n}(m') \approx \hat{E}_{U_n}(m')\) but \(\hat{E}_{U_n}(m) \approx \hat{E}_{U_n}(m')\) since \(\hat{E}\) is essentially the same as the \(t\)-times repetition scheme we analyzed above. Thus by the triangle inequality we can conclude that \(E_{U_n}(m) \approx E_{U_n}(m')\) as we desired.

**Proof of claim:** We prove the claim by the hybrid method. For
\(j\in \{0,\ldots, \ell\}\), let \(H_j\) be the distribution of ciphertexts
where in the first \(j\) blocks we act like \(\hat{E}\) and in the last
\(t-j\) blocks we act like \(E\). That is, we choose
\(k_0,\ldots,k_t,k'_1,\ldots,k'_t\) independently at random from \(U_n\) and
the \(i^{th}\) block of \(H_j\) is equal to \(E'_{k_{i-1}}(k_i,m_i)\) if \(i>j\)
and is equal to \(E'_{k_{i-1}}(k'_i,m_i)\) if \(i\leq j\). Clearly,
\(H_t = \hat{E}_{U_n}(m)\) and \(H_0 = E_{U_n}(m)\) and so it suffices to
prove that for every \(j\), \(H_j \approx H_{j+1}\). Indeed, let
\(j \in \{0,\ldots,\ell\}\) and suppose towards the sake of contradiction
that there exists an efficient \(Eve'\) such that

\[ \left| {\mathbb{E}}[ Eve'(H_j)] - {\mathbb{E}}[ Eve'(H_{j+1})]\right|\geq \epsilon \;\;(*) \]

where \(\epsilon = \epsilon(n)\) is noticeable. By the averaging principle, there exists some fixed choice for \(k'_1,\ldots,k'_t,k_0,\ldots,k_{j-2},k_j,\ldots,k_t\) such that \((*)\) still holds. Note that in this case the only randomness is the choice of \(k_{j-1}{\leftarrow_{\tiny R}}U_n\) and moreover the first \(j-1\) blocks and the last \(t-j\) blocks of \(H_j\) and \(H_{j+1}\) would be identical and we can denote them by \(\alpha\) and \(\beta\) respectively and hence write \((*)\) as

\[ \left| {\mathbb{E}}_{k_{j-1}}[ Eve'(\alpha,E'_{k_{j-1}}(k_{j},m_j),\beta) - Eve'(\alpha,E'_{k_{j-1}}(k'_j,m_j),\beta) ] \right| \geq \epsilon \;\;(**) \]

But now consider the adversary \(Eve\) that is defined as \(Eve(c) = Eve'(\alpha,c,\beta)\). Then \(Eve\) is also efficient and by \((**)\) it can distinguish between \(E'_{U_n}(k_j,m_j)\) and \(E'_{U_n}(k'_j,m_j)\) thus contradicting the security of \((E',D')\). This concludes the proof of the claim and hence the theorem.

### Appendix: The computational model

For concreteness sake let us give a precise definition of what it means for a function or probabilistic process \(f\) mapping \(\{0,1\}^n\) to \(\{0,1\}^m\) to be computable using \(T\) operations. This is the model of RAND programs as in my introduction to TCS lecture notes, also known as the model of (probabilistic) Boolean circuits.

A *probabilistic straightline program* consists of a sequence of lines,
each one of them one of the following forms:

`foo = bar NAND baz`

where`foo`

,`bar`

,`baz`

are variable identifiers.`foo = RAND`

where`foo`

is a variable identifier.

Given a program \(\pi\), we say that its *size* is the number of lines it
contains. Variables beginning with `x_`

and `y_`

are considered input
and output variables respectively. We require such variables to have the
forms `x_`

\(0\),\(\ldots\),`x_`

\(n-1\) for some \(n>0\) and `y_`

\(0\), \(\ldots\),
`y_`

\(m-1\). The program computes the probabilistic process that maps
\(\{0,1\}^n\) to \(\{0,1\}^m\) in the natural way. If \(F\) is a
(probabilistic or deterministic) map of \(\{0,1\}^n\) to \(\{0,1\}^m\), the
*complexity* of \(F\) is the size of the smallest program \(P\) that
computes it.

If you haven’t taken a class such as CS121 before, you might wonder how
such a simple model captures complicated programs that use loops,
conditionals, and more complex data types than simply a bit in
\(\{0,1\}\), not to mention some special purpose crypto-breaking devices
that might involve tailor-made hardware. It turns out that it does (for
the same reason we can compile complicated programming languages to run
on silicon chips with a very limited instruction set). In fact, as far
as we know, this model can capture even computations that happen in
nature, whether it’s in a bee colony or the human brain (which contains
about \(10^{10}\) neurons, so should in principle be simulatable by a
program that has up to a few order of magnitudes of the same number of
lines). Crucially, for cryptography, we care about such programs not
because we want to actually run them, but because we want to argue about
their *non existence*.An interesting potential exception to this principle that every
natural process should be simulatable by a straightline program of
comparable complexity are processes where the quantum mechanical
notions of *interference* and *entanglement* play a significant
role. We will talk about this notion of *quantum computing* towards
the end of the course, though note that much of what we say does not
really change when we add quantum into the picture. As discussed in
my lecture notes, we can still capture these
processes by straightline programs (that now have somewhat more
complex form), and so most of what we’ll do just carries over in the
same way to the quantum realm as long as we are fine with
conjecturing the strong form of the cipher conjecture, namely that
the cipher is infeasible to break even for quantum computers. (All
current evidence points toward this strong form being true as well.) If we have a process that cannot be computed
by a straightline program of length shorter than \(2^{128}>10^{38}\) then
it seems safe to say that a computer the size of the human brain (or
even all the human and nonhuman brains on this planet) will not be able
to perform it either.

Advanced note:The computational model we use in this class isnon uniform(corresponding to Boolean circuits) as opposed touniform(corresponding to Turing machines). If this distinction doesn’t mean anything to you, you can ignore it as it won’t play a significant role in what we do next. It basically means that we do allow our programs to have hardwired constants of \(poly(n)\) bits where \(n\) is the input/key length. In fact, to be precise, we will hold ourselves to a higher standard than our adversary, in the sense that we require our algorithms to be efficient in the stronger sense of being computable in uniform probabilistic polynomial time (for some fixed polynomial, often \(O(n)\) or \(O(n^2\))), while the adversary is allowed to use non uniformity.