Cryptography
Introduction to Cryptography
Security and privacy are core principles in computing, enabling a wide range of applications including online commerce, social networking, wireless communication, and so on. Cryptography, which is concerned with techniques and protocols for secure communication, is fundamental to building systems that provide security and privacy. In this unit, we will examine several cryptographic protocols, which address the following needs:
authentication: proving one’s identity
privacy/confidentiality: ensuring that no one can read the message except the intended receiver
integrity: guaranteeing that the received message has not been altered in any way
Our standard problem setup is that we have two parties who wish to communicate, traditionally named Alice and Bob. However, they are communicating over an insecure channel, and there is an eavesdropper Eve who can observe all their communication, and in some cases, can even modify the data in-flight. How can Alice and Bob communicate while achieving the goals of authentication, privacy, and integrity?
A central goal in designing a cryptosystem is Kerckhoff’s principle:
A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.
Thus, we want to ensure that Alice and Bob can communicate securely even if Eve knows every detail about what protocol they are using, other than the key, a secret piece of knowledge that is never communicated over the insecure channel.
We refer to the message that Alice and Bob wish to communicate as the plaintext. We want to design a cryptosystem that involves encoding the message in such a way as to prevent Eve from recovering the plaintext, even with access to the ciphertext, the result of encoding the message. There are two levels of security around which we can design a cryptosystem:
Information-theoretic, or unconditional, security: Eve cannot learn the secret message communicated between Alice and Bob, even with unbounded computational power.
Computational, or conditional, security: to learn any information about the secret message, Eve will have to solve a computationally hard problem.
The cryptosystems we examine all rely to some extent on modular arithmetic. Before we proceed further, we review some basic details about modular arithmetic.
Review of Modular Arithmetic
Modular arithmetic is a mathematical system that maps the infinitely many integers to a finite set, those in \(\{0, 1, \dots, n-1\}\) for some positive modulus \(n \in \Z^+\). The core concept in this system is that of congruence: two integers \(a\) and \(b\) are said to be congruent modulo \(n\), written as
when \(a\) and \(b\) differ by an integer multiple of \(n\):
Note that \(a\) and \(b\) need not be in the range \([0, n)\); in fact, if \(a \ne b\), than at least one must be outside this range for \(a\) and \(b\) to be congruent modulo \(n\). More importantly, for any integer \(i \in \Z\), there is exactly one integer \(j \in \{0, 1, \dots, n-1\}\) such that \(i \equiv j \pmod{n}\). This is because the elements in this set are at most \(n-1\) apart, so no two elements differ by a multiple of \(n\). At the same time, by Euclid’s division lemma, we know that there exist unique integers \(q\) and \(r\) such that
where \(0 \le r < n\). Thus, each integer \(i \in \Z\) is mapped to exactly one integer \(j \in \{0, 1, \dots, n-1\}\) by the congruence relation modulo \(n\).
Formally, we can define a set of equivalence classes, denoted by \(\Z_n\), corresponding to each element in \(\{0, 1, \dots, n-1\}\):
Each class \(\overline{j}\) consists of the integers that are congruent to \(j\) modulo \(n\):
However, the overline notation here is cumbersome, so we usually elide it, making the equivalence classes implicit instead:
We refer to determining the equivalence class of an integer \(i\) modulo \(n\) as reducing it modulo \(n\). If we know what \(i\) is, we need only compute its remainder when divided by \(n\) to reduce it. More commonly, we have a complicated expression for \(i\), consisting of additions, subtractions, multiplications, exponentiations, and so on. Rather than evaluating the expression directly, we can take advantage of properties of modular arithmetic to simplify the task of reducing the expression.
Suppose \(a \equiv a' \pmod{n}\) and \(b \equiv b' \pmod{n}\) for a modulus \(n \ge 1\). Then
and
By definition of congruence, we have \(a - a' = kn\) and \(b - b' = mn\) for some integers \(k\) and \(m\). Then
Since \(a + b\) and \(a' + b'\) differ by an integer (\(k + m\)) multiple of \(n\), we conclude that \(a + b \equiv a' + b' \pmod{n}\). The proof for \(a - b \equiv a' - b' \pmod{n}\) is similar.
Suppose \(a \equiv a' \pmod{n}\) and \(b \equiv b' \pmod{n}\) for a modulus \(n \ge 1\). Then
By definition of congruence, we have \(a - a' = kn\) and \(b - b' = mn\) for some integers \(k\) and \(m\). Then
Since \(ab\) and \(a'b'\) differ by an integer (\(kmn + a'm + b'k\)) multiple of \(n\), we conclude that \(ab \equiv a'b' \pmod{n}\).
Suppose \(a \equiv b \pmod{n}\). Then for any integer \(k \ge 0\),
We can prove Corollary 244 by observing that \(a^k = a \cdot a \cdot \dots \cdot a\) is the product of \(k\) copies of \(a\) and applying Property 242 along with induction over \(k\). We leave the details as an exercise.
The following is an example that applies the properties above to reduce a complicated expression.
Suppose we wish to find an element \(a \in \Z_7\) such that
Note that the properties above do not give us the ability to reduce any of the exponents modulo 7. (Later, we will see Fermat’s little theorem, which does give us a means of simplifying exponents. We also will see fast modular exponentiation, but we will not use that here.) We can use Property 244 once we reduce the base
which we can recursively reduce using the properties above.
Let’s start with \(2^{203}\). Observe that \(2^3 = 8 \equiv 1 \pmod{n}\). Then
Similarly, \(3^3 = 27 \equiv -1 \pmod{7}\), so \(3^6 \equiv 1 \pmod{7}\). This gives us
We also have \(4^3 = 2^6 = (2^3)^2 \equiv 1 \pmod{7}\), so
Combining these using the addition and multiplication properties above, we get
We can now reduce the full expression:
Thus, \(a \equiv 4 \pmod{7}\).
Fast Modular Exponentiation
There are several ways to compute a modular exponentiation \(a^b \pmod{n}\) efficiently, and we take a look at two here.
The first is to apply a top-down, divide-and-conquer strategy. We have:
This leads to the following algorithm:
This gives us the following recurrence for the number of multiplications and modulo operations:
Applying the Master theorem, we get \(T(b) = \O(\log b)\).
Furthermore, the numbers in this algorithm are always computed modulo \(n\), so they are at most as large as \(n\). Thus, each multiplication and modulo operation can be done efficiently with respect to \(\O(\log n)\), and the algorithm as a whole is efficient.
An alternative method is to apply a bottom-up strategy. Here, we make use of the binary representation of \(b\),
where \(b_i\) is either 0 or 1. Then
Thus, we can compute \(a^{2^i}\) for each \(0 \le i \le r\), where \(r = \floor{\log b}\). We do so as follows:
The operation \(\floor{b/2^i}\) is a right shift on the binary representation of \(b\), and it can be done in linear time. As with the top-down algorithm, we perform \(\O(\log b)\) multiplication and modulo operations, each on numbers that are \(\O(\log n)\) in size. Thus, the runtime is efficient in the size of the input.
Division and Modular Inverses
We have seen how to do addition, subtraction, multiplication, and exponentiation in modular arithmetic, as well as several properties that help in reducing a complication expression using these operations. What about division? Division is not a closed operation over the set of integers, so it is perhaps not surprising that division is not always well-defined in modular arithmetic. However, for some combinations of \(n\) and \(a \in \Z_n\), we can determine a modular inverse that allows us to divide by \(a\). Recall that in standard arithmetic, dividing by a number \(x\) is equivalent to multiplying by the \(x^{-1}\), the multiplicative inverse of \(x\). For example:
In the same way, division by \(a\) is defined modulo \(n\) exactly when an inverse \(a^{-1}\) exists for \(a\) modulo \(n\).
Let \(n\) be a positive integer and \(a\) be an element of \(\Z_n^+\). An inverse of \(a\) modulo \(n\) is an element \(b \in \Z_n^+\) such that
An inverse of \(a\), denoted as \(a^{-1}\), exists modulo \(n\) if and only if \(a\) and \(n\) are coprime, i.e. \(\gcd(a, n) = 1\).
A modular inverse can be efficiently found using the extended Euclidean algorithm, a modification of Euclid’s algorithm.
The complexity bound is the same as for Euclid’s algorithm – \(\O(\log x)\) number of operations.
Given \(x > y \ge 0\), the extended Euclidean algorithm returns a triple \((g, a, b)\) such that
and \(g = \gcd(x, y)\).
We demonstrate that \(g = ax + by\) by (strong) induction over \(y\):
Base case: \(y = 0\). The algorithm returns \((g,a,b) = (x,1,0)\), and we have
\[ax + by = 1\cdot x + 0\cdot 0 = x = g\]Inductive step: \(y > 0\). Let \(x = qy + r\), so that \(q = \floor{x/y}\) and \(r = x \bmod y < y\). Given arguments \(y\) and \(r\), the recursion produces \((g,a',b')\). By the inductive hypothesis, we assume that
\[g = a'y + b'r\]The algorithm computes the return values \(a\) and \(b\) as
\[\begin{split}a &= b'\\ b &= a' - b'q\end{split}\]Then
\[\begin{split}ax + by &= b'x + (a' - b'q)y\\ &= b'(qy + r) + (a' - b'q)y\\ &= b'r + a'y\\ &= g\end{split}\]Thus, we conclude that \(g = ax + by\) as required.
Given \(x\) and \(y\), when \(g = \gcd(x, y) = 1\), we have
which imply that
by definition of modular arithmetic. Thus, the extended Euclidean algorithm computes \(a\) as the inverse of \(x\) modulo \(y\) and \(b\) as the inverse of \(y\) modulo \(x\), when these inverses exist.
We compute the inverse of 13 modulo 21 using the extended Euclidean algorithm. Since 13 and 21 are coprime, such an inverse must exist.
We keep track of the values of \(x\), \(y\), \(g\), \(a\), and \(b\) at each recursive step of the algorithm. First, we trace the algorithm from the initial \(x = 21, y = 13\) down to the base case:
Step |
\(x\) |
\(y\) |
---|---|---|
0 |
\(21\) |
\(13\) |
1 |
\(13\) |
\(8\) |
2 |
\(8\) |
\(5\) |
3 |
\(5\) |
\(3\) |
4 |
\(3\) |
\(2\) |
5 |
\(2\) |
\(1\) |
6 |
\(1\) |
\(0\) |
We can then trace the algorithm back up, computing the return values \(g, a, b\):
Step |
\(x\) |
\(y\) |
\(g\) |
\(a\) |
\(b\) |
---|---|---|---|---|---|
6 |
\(1\) |
\(0\) |
\(1\) |
\(1\) |
\(0\) |
5 |
\(2\) |
\(1\) |
\(1\) |
\(0\) |
\(1-0\cdot\floor{\frac{2}{1}}=1\) |
4 |
\(3\) |
\(2\) |
\(1\) |
\(1\) |
\(0-1\cdot\floor{\frac{3}{2}}=-1\) |
3 |
\(5\) |
\(3\) |
\(1\) |
\(-1\) |
\(1-(-1)\cdot\floor{\frac{5}{3}}=2\) |
2 |
\(8\) |
\(5\) |
\(1\) |
\(2\) |
\(-1-2\cdot\floor{\frac{8}{5}}=-3\) |
1 |
\(13\) |
\(8\) |
\(1\) |
\(-3\) |
\(2-(-3)\cdot\floor{\frac{13}{8}}=5\) |
0 |
\(21\) |
\(13\) |
\(1\) |
\(5\) |
\(-3-5\cdot\floor{\frac{21}{13}}=-8\) |
Thus, we have \(by = -8 \cdot 13 \equiv 1 \pmod{21}\). Translating \(b\) to an element of \(\Z_{21}\), we get \(b = -8 \equiv 13 \pmod{21}\), which means that 13 is its own inverse modulo 21. We can verify this:
The extended Euclidean algorithm is a constructive proof that when \(gcd(a, n) = 1\) for \(n \in \Z^+\) and \(a \in \Z_n^+\), an inverse of \(a\) exists modulo \(n\). Show that no such inverse exists when \(gcd(a, n) > 1\), completing the proof of Theorem 248.
One-time Pad
We now take a look at the first category of cryptosystems, that of information-theoretic security, which relies on one-time pads.
A one-time pad is an encryption technique that relies on a key with the following properties:
The key is a random string at least as long as the plaintext.
The key is preshared between the communicating parties over some secure channel.
The key is only used once (hence the name one-time pad).
As an example of a scheme that uses a one-time pad, consider a plaintext message
where each symbol \(m_i\) is a lowercase English letter. Let the secret key also be composed of lowercase letters:
Here, the key is the same length as the message. We encrypt the message as
by taking the sum of each plaintext character and the corresponding key character modulo 26:
We map between lowercase characters and integers modulo 26, with 0 taken to be the letter a, 1 to be the letter b, and so on. Then decryption is as follows:
Applying both operations results in \(D_k(E_k(m)) = m\), the original plaintext message.
As a concrete example, suppose \(m = \texttt{flower}\) and \(k = \texttt{lafswl}\). Then the ciphertext is
and we have \(D_k(E_k(m)) = \texttt{flower}\).
Observe that Eve has no means of learning any information about \(m\) from the ciphertext \(\texttt{qltoac}\), other than that the message is six letters (and we can avoid even that by padding the message with random additional data). Even if Eve knows that the message is in English, she has no way of determining which six-letter word it is. For instance, the ciphertext \(\texttt{qltoac}\) could have been produced by the plaintext \(\texttt{futile}\) and key \(\texttt{lragpy}\). In fact, for any six-letter sequence \(s\), there is a key \(k\) such that \(E_k(s) = c\) for any six-letter ciphertext \(c\). Without having access to either the plaintext or key directly, Eve cannot tell whether the original message is \(\texttt{flower}\), \(\texttt{futile}\), or some other six-letter sequence.
Thus, a one-time-pad scheme provides information-theoretic security – an adversary cannot recover information about the message that they do not already know. In fact, one-time-pad schemes are the only cryptosystems that provide this level of security. However, there are significant tradeoffs, which are exactly the core requirements of a one-time pad:
The key must be preshared between the communicating parties through some other, secure channel.
The key has to be as long as the message, which limits the amount of information that can be communicated given a preshared key.
The key can only be used once.
These limitations make one-time pads costly to use in practice. Perhaps by relaxing some of these restrictions, we can obtain “good-enough” security at a lower cost? We first take a look at what happens when we reduce the key size – in fact, we will take this to the extreme, reducing our key size to a single symbol. This results in a scheme known as a Caesar cipher.
Let \(m = m_1 m_2 \dots m_n\) be a plaintext message. Let \(s\) be a single symbol to be used as the key. Then in a Caesar cipher, \(m\) is encrypted as
and a ciphertext \(c\) is decrypted as
The result is \(D_s(E_s(m)) = m\).
Observe that a Caesar cipher is similar to a one-time pad, except that the key only has one character of randomness as opposed to \(n\); essentially, we have \(k_i = s\) for all \(i\), i.e. a key where all the characters are the same.
Unfortunately, the Caesar cipher suffers from a fatal flaw in that it can be defeated by statistical analysis – in particular, the relative frequencies of letters in the ciphertext allow symbols to be mapped to the underlying message, using a frequency table of how common individual letters are in the language in which the message is written. In English, for instance, the letters e and t are most common. A full graph of frequencies of English letters in “average” English text is as follows:
A Caesar cipher merely does a circular shift of these frequencies, making it straightforward to recover the key as the magnitude of that shift. The longer the message, the more likely the underlying frequencies match the average for the language, and the more easily the scheme is broken.
Suppose the result of applying a Caesar cipher produces the ciphertext
Use frequency analysis to determine both the original message \(m\) and the key \(s\).
Note that another weakness in the Caesar-cipher scheme as described above is that the key is restricted to one of twenty-six possibilities, making it trivial to brute force the mapping. However, the scheme can be tweaked to use a much larger character set, making it harder to brute force but still leaving it open to statistical attacks in the form of frequency analysis.
A scheme that compromises between a one-time pad and Caesar cipher, by making the key larger than a single character but smaller than the message size, still allows information to leak through statistical attacks. Such a scheme is essentially the same as reusing a one-time pad more than once. What can go wrong if we do so?
Suppose we have the following two plaintext messages
where each character is a lowercase English letter. If we encode them both with the same key \(k = k_1 k_2 \dots k_n\), we obtain the ciphertexts
If both these ciphertexts go out over the insecure channel, Eve can observe them both and compute their difference:
Here, all subtractions are done modulo 26. The end result is the character-by-character difference between the two messages (modulo 26). Unless the plaintext messages are random strings, this difference is not random! Again, statistical attacks can be used to obtain information about the original messages.
As a pictorial illustration of how the difference of two messages reveals information, the following is an image encoded with a random key under a one-time pad:
The result appears as just random noise, as we would expect. The following is another image encoded with the same key:
This result too appears as random noise. However it is the same noise, and if we subtract the two ciphertexts, the noise all cancels out:
The end result clearly reveals information about the original images.
In summary, the only way to obtain information-theoretic security is by using a one-time-pad scheme, where the key is at least as long as the message and is truly used only once. A one-time-pad-like scheme that compromises on these two characteristics opens up the scheme to statistical attacks. The core issue is that plaintext messages are not random – they convey information between parties by virtue of not being random. In a one-time pad, the key is the actual source of the randomness in the ciphertext. And if we weaken the scheme by reducing the key size or reusing a key, we lose enough randomness to enable an adversary to learn information about the plaintext messages.
Diffie-Hellman Key Exchange
Many encryption schemes, including a one-time pad, are symmetric, using the same key for both encryption and decryption. Such a scheme requires a preshared secret key that is known to both communicating parties. Ideally, we’d like the two parties to be able to establish a shared key even if all their communication is over an insecure channel. This is known as the key exchange problem.
One solution to the key-exchange problem is the Diffie-Hellman protocol. The central idea is that each party has its own secret key – this is a private key, since it is never shared with anyone. They each use their own private key to generate a public key, which they transmit to each other over the insecure channel. A public key is generated in such a way that recovering the private key would require solving a computationally hard problem, resulting in computational rather than information-theoretic security. Finally, each party uses its own private key and the other party’s public key to obtain a shared secret.
Before we examine the details of the Diffie-Hellman protocol, we discuss some relevant concepts from modular arithmetic. Let \(\Z_q\) refer to the set of integers between 0 and \(q-1\), inclusive:
We then define a generator of \(\Z_p\), where \(p\) is prime, as follows:
An element \(g \in \Z_p\), where \(p\) is prime, is a generator of \(\Z_p\) if for every nonzero element \(x \in \Z_p\) (i.e. every element in \(\Z_p^+\)), there exists a number \(i \in \N\) such that:
In other words, \(g\) is an \(i\)th root of \(x\) over \(\Z_p\) for some natural number \(i\).
As an example, \(g = 2\) is a generator of \(\Z_5\):
Similarly, \(g = 3\) is a generator of \(\Z_7\):
On the other hand, \(g = 2\) is not a generator of \(\Z_7\):
We see that the powers of \(g = 2\) are cyclic, without ever generating the elements \(\{3,5,6\} \subseteq \Z_7^+\).
If \(p\) is prime, then \(\Z_p\) has at least one generator.
This theorem was first proved by Gauss. Moreover, \(\Z_p\) has \(\phi(p-1)\) generators when \(p\) is prime, where \(\phi(n)\) is Euler’s totient function, making it relatively easy to find a generator over \(\Z_p\).
We now return to the Diffie-Hellman protocol, which is as follows:
Suppose two parties, henceforth referred to as Alice and Bob, wish to establish a shared secret key \(k\) but can only communicate over an insecure channel. Alice and Bob first establish public parameters \(p\), a very large prime number, and \(g\), a generator of \(\Z_p\). Then:
Alice picks a random \(a \in \Z_p^+\) as her private key, computes \(A \equiv g^a \pmod{p}\) as her public key, and sends \(A\) to Bob over the insecure channel.
Bob picks a random \(b \in \Z_p^+\) as his private key, computes \(B \equiv g^b \pmod{p}\) as his public key, and sends \(B\) to Alice over the insecure channel.
Alice computes
\[k \equiv B^a \equiv (g^b)^a \equiv g^{ab} \pmod{p}\]as the shared secret key.
Bob computes
\[k \equiv A^b \equiv (g^a)^b \equiv g^{ab} \pmod{p}\]as the shared secret key.
This protocol is efficient – suitable values of \(p\) and \(g\) can be obtained from public tables, and Alice and Bob each need to perform two modular exponentiations, which can be done efficiently. But is it secure? By Kerckhoff’s principle, Eve knows how the protocol works, and she observes the values \(p\), \(g\), \(A \equiv g^a \pmod{p}\), and \(B \equiv g^b \pmod{p}\). However, she does not know \(a\) or \(b\), since those values are never communicated and so are known only to Alice and Bob, respectively. Can Eve obtain \(k \equiv g^{ab} \pmod{p}\) from what she knows? The Diffie-Hellman assumption states that she cannot do so efficiently.
There is no efficient algorithm that computes \(g^{ab} \pmod{p}\) given:
a prime \(p\),
a generator \(g\) of \(\Z_p\),
\(g^a \pmod{p}\), and
\(g^b \pmod{p}\).
What if, rather than attempting to recover \(g^{ab} \pmod{p}\) directly, Eve attempts to recover one of the private keys \(a\) or \(b\)? This entails solving the discrete logarithm problem.
Let \(q\) be a modulus and let \(g\) be a generator of \(\Z_q\). The discrete logarithm of \(x \in \Z_q^+\) with respect to the base \(g\) is an integer \(i\) such that
The discrete logarithm problem is the task of computing \(i\), given \(q\), \(g\), and \(x\).
If \(q\) is prime and \(g\) is a generator of \(\Z_q\), then every \(x \in \Z_q^+\) has a discrete log \(i\) with respect to base \(g\) such that \(0 \le i < q - 1\) [1]. Thus, a brute-force algorithm to compute the discrete log is to try every possible value in this range. However, such an algorithm requires checking \(\O(q)\) possible values, and this is exponential in the size of the inputs, which take \(\O(\log q)\) bits to represent. The brute-force algorithm is thus inefficient.
Do better algorithms exist for computing a discrete log? Indeed they do, including the baby-step giant-step algorithm, which requires \(\O(\sqrt{q})\) multiplications on numbers of size \(\O(\log q)\). However, as we saw in primality testing, this is still not polynomial in the input size \(\O(\log q)\). In fact, the discrete logarithm assumption states that no efficient algorithm exists.
There is no efficient algorithm to compute \(i\) given:
\(q\),
a generator \(g\) of \(\Z_q\), and
\(g^i \pmod{q}\),
where \(0 \le i < q - 1\).
Neither the Diffie-Hellman nor the discrete logarithm assumption have been proven. However, they have both held up in practice, as there is no known algorithm to defeat either assumption on classical computers. As we will briefly discuss later, the assumptions do not hold on quantum computers, but this is not yet a problem in practice as no scalable quantum computer exists.
RSA
The Diffie-Hellman protocol uses private and public keys for key exchange, resulting in a shared key known to both communicating parties, which can then be used in a symmetric encryption scheme. However, the concept of a public-key cryptosystem can also be used to formulate an asymmetric encryption protocol, which uses different keys for encryption and decryption. The RSA (Rivest-Shamir-Adleman) cryptosystem gives rise to one such protocol that is widely used in practice.
As with Diffie-Hellman, the RSA system relies on facts about modular arithmetic. Core to the working of RSA is Fermat’s little theorem and its variants.
Let \(p\) be a prime number. Let \(a\) be any element of \(\Z_p^+\), where
Then \(a^{p-1} \equiv 1 \pmod{p}\).
Let \(p\) be a prime number. Then \(a^p \equiv a \pmod{p}\) for any element \(a \in \Z_p\), where
As an example, let \(p = 7\). Then:
Fermat’s little theorem can be extended to the product of two distinct primes.
Let \(n = pq\) be a product of two distinct primes, i.e. \(p \ne q\). Let \(a\) be an element of \(\Z_n\). Then
where \(k \in \N\) and \(\phi(n)\) is Euler’s totient function, whose value is the number of elements in \(\Z_n^+\) that are coprime to \(n\). For a product of two distinct primes \(n = pq\),
We prove Theorem 265 using Fermat’s little theorem. First, we show that for any \(a \in \Z_n\) where \(n = pq\) is the product of two distinct primes \(p\) and \(q\), if \(a \equiv b \pmod{p}\) and \(a \equiv b \pmod{q}\), then \(a \equiv b \pmod{n}\). By definition of modular arithmetic, we have
where \(k\) and \(m\) are integers. Then:
Since \(p\) divides \(kp\), \(p\) also divides \(mq\); however, since \(p\) is a distinct prime from \(q\), this implies that \(p\) divides \(m\) (by Euclid’s lemma, which states that if a prime \(p\) divides \(ab\) where \(a,b\in\Z\), then \(p\) must divide at least one of \(a\) or \(b\)). Thus, \(m = rp\) for some integer \(r\), and we have
Now consider \(a^{1 + k\phi(n)}\). If \(a = 0\), we trivially have
If \(a \ne 0\), we have:
Similarly:
Applying our earlier result, we conclude that \(a^{1 + k\phi(n)} \equiv a \pmod{n}\).
As an example, let \(n = 6\). We have \(\phi(n) = 2\). Then:
Fermat’s little theorem has many proofs; we take a look at a proof that relies solely on the existence of modular inverses (Theorem 248).
Let \(p\) be prime, and let \(a \in \Z_p^+\) be an arbitrary nonzero element of \(\Z_p\). Consider the set of elements
These elements are all nonzero and distinct when taken modulo \(p\):
Suppose \(ka \equiv 0 \pmod{p}\). Since \(a \in \Z_p^+\), it has an inverse \(a^{-1}\) modulo \(p\), so we can multiply both sides by \(a^{-1}\) to obtain \(k \equiv 0 \pmod{p}\). Since \(k \not\equiv 0 \pmod{p}\) for all elements in \(\{1, 2, \dots, p-1\}\), \(ka \not\equiv 0 \pmod{p}\) for all elements \(ka \in S\).
Suppose \(ka \equiv ma \pmod{p}\). Multiplying both sides by \(a^{-1}\), we obtain \(k \equiv m \pmod{p}\). Since all pairs of elements in \(\{1, 2, \dots, p-1\}\) are distinct modulo \(p\), all pairs of elements in \(S = \{1a, 2a, \dots, (p-1)a\}\) are also distinct modulo \(p\).
Since there are \(p-1\) elements in \(S\), and they are all nonzero and distinct modulo \(p\), the elements of \(S\) when taken modulo \(p\) are exactly those in \(\Z_p^+\). Thus, the products of the elements in each set are equivalent modulo \(p\):
Since \(p\) is prime, all elements in \(\Z_p^+\) have an inverse modulo \(p\), so we multiply both sides above by \((1^{-1} \times 2^{-1} \times \dots \times (p-1)^{-1})\) to obtain
Now that we have established the underlying mathematical facts, we take a look at the RSA encryption protocol.
Suppose Bob wishes to send a message \(m\) to Alice, but the two can only communicate over an insecure channel.
Alice chooses two very large, distinct primes \(p\) and \(q\). We assume without loss of generality that \(m\), when interpreted as an integer (using the bitstring representation of \(m\)), is smaller than \(n = pq\); otherwise, \(m\) can be divided into smaller pieces that are then sent separately using this protocol.
Alice chooses an element
\[e \in \Z_n^+ \text{ such that } \gcd(e, \phi(n)) = 1\]as her public key, with the requirement that it be coprime with \(\phi(n) = (p-1)(q-1)\). She computes the inverse of \(e\) modulo \(\phi(n)\)
\[d \equiv e^{-1} \pmod{\phi(n)}\]as her private key.
Alice sends \(n\) and \(e\) to Bob.
Bob computes
\[c \equiv m^e \pmod{n}\]as the ciphertext and sends it to Alice.
Alice computes
\[m' \equiv c^d \pmod{n}\]with the result that \(m' = m\).
This protocol is efficient. Alice can find large primes using an efficient primality test such as the Fermat test – in fact, she need only do this once, reusing the same parameters \(n,e,d\) for future communication. Computing the inverse of \(e\) can be done efficiently using the extended Euclidean algorithm. Finally, modular exponentiation can also be done efficiently.
Does the protocol work? We have
Since \(d \equiv e^{-1} \pmod{\phi(n)}\), we have
by definition of modular arithmetic, where \(k \in \N\). By Theorem 265,
Thus, Alice does indeed recover the intended message \(m\).
Finally, is the protocol secure? The public information that Eve can observe consists of \(n\), \(e\), and \(c \equiv m^e \pmod{n}\). Eve does not know the private parameters \(p,q,d\), which Alice has kept to herself. Can Eve recover \(m\)? The RSA assumption states that she cannot do so efficiently.
There is no algorithm that efficiently computes \(m\) given:
a product of two distinct primes \(n\),
an element \(e \in \Z_n^+\) such that \(\gcd(e, \phi(n)) = 1\), and
\(m^e \pmod{n}\).
Rather than trying to compute \(m\) directly, Eve could attempt to compute \(\phi(n)\), which would allow her to compute \(d \equiv e^{-1} \pmod{\phi(n)}\) and thus decrypt the ciphertext \(c \equiv m^e \pmod{n}\) using the same process as Alice. However, recovering \(\phi(n) = (p-1)(q-1)\) is as hard as factoring \(n = pq\), and the factorization hardness assumption states that she cannot do this efficiently.
There is no efficient algorithm to compute the prime factorization \(p_1, p_2, \dots p_k\) of an integer \(n\), where
for \(a_i \in \Z^+\).
Like the Diffie-Hellman and discrete logarithm assumptions, the RSA and factorization hardness assumptions have not been proven, but they appear to hold in practice. And like the former two assumptions, the latter two do not hold on quantum computers; we will discuss this momentarily.
Suppose that \(n = pq\) is a product of two distinct primes \(p\) and \(q\). Demonstrate that obtaining \(\phi(n)\) is as hard as obtaining \(p\) and \(q\) by showing that given \(\phi(n)\), the prime factors \(p\) and \(q\) can be computed efficiently.
The ElGamal encryption scheme relies on the hardness of the discrete logarithm problem to perform encryption, like Diffie-Hellman does for key exchange. The following describes how Bob can send an encrypted message to Alice using ElGamal. Assume that a large prime \(p\) and generator \(g\) of \(\Z_p\) have already been established.
Alice chooses a private key \(a \in \Z_p^+\), computes \(A \equiv g^a \pmod{p}\) as her public key, and sends \(A\) to Bob.
Bob chooses a private key \(b \in \Z_p^+\), computes \(B \equiv g^b \pmod{p}\) as his public key, and sends \(B\) to Alice.
Alice and Bob both compute the shared key \(k \equiv g^{ab} \pmod{p}\).
Bob encrypts the message \(m\) as
\[c \equiv m \cdot k \equiv m \cdot g^{ab} \pmod{p}\]and sends the ciphertext \(c\) to Alice.
Show how Alice can recover \(m\) from the information she knows (\(p\), \(g\), \(a\), \(A\), \(B\), \(k\), \(c\)).
Demonstrate that if Eve has an efficient algorithm for recovering \(m\) from what she can observe (\(p\), \(g\), \(A\), \(B\), \(c\)), she also has an efficient method for breaking Diffie-Hellman key exchange.
RSA Signatures
Thus far, we have focused on the privacy and confidentiality provided by encryption protocols. We briefly consider authentication and integrity, in the form of a signature scheme using RSA. The goal here is not to keep a secret – rather, given a message, we want to verify the identity of the author as well as the integrity of its contents. The way we do so is to essentially run the RSA encryption protocol “backwards” – rather than having Bob run the encryption process on a plaintext message and Alice the decryption on a ciphertext, we will have Alice apply the decryption function to a message she wants to sign, and Bob will apply the encryption function to the resulting signed message.
Suppose Alice wishes to send a message \(m\) to Bob and allow Bob to verify that he receives the intended message. As with the RSA encryption protocol, Alice computes \((e,n)\) as her public key and sends them to Bob, and she computes \(d \equiv e^{-1} \pmod{\phi(n)}\) as her private key. Then:
Alice computes
\[s \equiv m^d \pmod{n}\]and sends \(m\) and \(s\) to Bob.
Bob computes
\[m' \equiv s^e \pmod{n}\]and verifies that the result is equal to \(m\).
The correctness of this scheme follows from the correctness of RSA encryption; we have:
Thus, if \(m' = m\), Bob can be assured that Alice sent the message – only she knows \(d\), so only she can efficiently compute \(m^d \pmod{n}\). In essence, the pair \(m, m^d \pmod{n}\) acts as a certificate that the sender knows the secret key \(d\).
In practice, this scheme needs a few more details to avoid the possibility of spoofing, or having someone else send a message purporting to be from Alice. We leave these details as an exercise.
The RSA signature scheme as described above is subject to spoofing – it is possible for a third party to produce a pair \(m, s\) such that \(s \equiv m^d \pmod{n}\).
Describe one way in which someone other than Alice can produce a matching pair \(m, m^d \pmod{n}\).
Propose a modification to the scheme that addresses this issue.
Hint: Think about adding some form of padding to the message to assist in verifying that Alice was the author.
Quantum Computers and Cryptography
The abstraction of standard Turing machines does not seem to quite capture the operation of quantum computers, but there are other models such as quantum Turing machines and quantum circuits that do so. The standard Church Turing thesis still applies – anything that can be computed on a quantum computer can be computed on a Turing machine. However, a quantum computer is a probabilistic model, so the extended Church-Turing thesis does not apply – we do not know whether quantum computers can solve problems more efficiently than so-called classical computers. There are problems that are known to have efficient probabilistic algorithms on quantum computers but do not have known, efficient probabilistic algorithms on classical computers. Discrete logarithm and integer factorization, the problems that are core to Diffie-Hellman and RSA, are two such problems; they are both efficiently solvable on quantum computers by Shor’s algorithm.
In practice, scalable quantum computers pose significant implementation challenges, and they are a long way from posing a risk to the security of Diffie-Hellman and RSA. As of this writing, the largest number known to have been factored by Shor’s algorithm on a quantum computer is 35, which was accomplished in 2019. This follows previous records of 21 in 2012 and 15 in 2001. Compare this to factorization on classical computers, where the 829-bit RSA-250 number was factored in 2020. Typical keys currently used in implementations of RSA have at least 2048 bits, and both quantum and classical computers are far from attacking numbers of this size. Regardless, post-quantum cryptograpy is an active area of research, so that if quantum computers do eventually become powerful enough to attack Diffie-Hellman or RSA, other cryptosystems can be used that are more resilient against quantum attacks.
In complexity-theoretic terms, the decision versions of discrete logarithm and integer factorization are in the complexity class \(\BQP\), which is the quantum analogue of \(\BPP\); in other words, \(\BQP\) is the class of languages that have efficient two-sided-error randomized algorithms on quantum computers. We know that
but we do not know whether this containment is strict. And like the relationship between \(\BPP\) and \(\NP\), most complexity theorists do not believe that \(\BQP\) contains all of \(\NP\). Neither the decision versions of discrete logarithm nor of integer factorization are believed to be \(\NP\)-complete – they are believed (but not proven) to be NP-Intermediate, i.e. in the class \(\NPI\) of languages that are in \(\NP\) but neither in \(\P\) nor \(\NP\)-complete. It is known that if \(\P \ne \NP\), then the class \(\NPI\) is not empty.
On the other hand, if \(\P = \NP\), then computational security is not possible – such security relies on the existence of hard problems that are efficiently verifiable, and no such problem exists if \(\P = \NP\).