\[\def\gets{:=} \def\N{\mathbb{N}} \def\Z{\mathbb{Z}} \def\R{\mathbb{R}} \def\Q{\mathbb{Q}} \def\O{\mathrm{O}} \def\Prop{\mathbb{P}} \def\Algorithm{\textbf{Algorithm}~} \def\Subroutine{\textbf{Subroutine}~} \def\Let{\textbf{Let}~} \def\Find{\textbf{Find}~} \def\Compute{\textbf{Compute}~} \def\For{\textbf{For}~} \def\While{\textbf{While}~} \def\If{\textbf{If}~} \def\Else{\textbf{Else}~} \def\ElseIf{\textbf{Else If}~} \def\Return{\textbf{Return}~} \def\return{~\textbf{return}~} \def\Output{\textbf{Output}~} \def\output{~\textbf{output}~} \def\tot{~\textrm{to}~} \def\andt{~\textrm{and}~} \def\ort{~\textrm{or}~} \def\Forallt{\textbf{For all}~} \def\forallt{~\textbf{for all}~} \def\Foreacht{\textbf{For each}~} \def\foreacht{~\textbf{for each}~} \newcommand{\abs}[1]{|#1|} \newcommand{\lrabs}[1]{\left|#1\right|} \newcommand{\parens}[1]{\left(#1\right)} \newcommand{\brackets}[1]{\left[#1\right]} \newcommand{\floor}[1]{\left\lfloor#1\right\rfloor} \newcommand{\ceil}[1]{\left\lceil#1\right\rceil} \def\gcd{\mathrm{gcd}} \def\qst{q_{start}} \def\qacc{q_{accept}} \def\qrej{q_{reject}} \newcommand{\lang}[1]{L_\mathrm{#1}} \def\atm{\lang{ACC}} \def\htm{\lang{HALT}} \def\ehtm{\lang{\varepsilon\text{-}HALT}} \def\eqtm{\lang{EQ}} \def\etm{\lang{\varnothing}} \def\oninput{\text{ on input }} \def\oninputs{\text{ on inputs }} \def\Run{\textbf{Run}~} \def\ont{~\textrm{on}~} \def\Accept{\textbf{Accept}~} \def\accept{~\textbf{accept}~} \def\accepts{~\textrm{accepts}~} \def\Reject{\textbf{Reject}~} \def\reject{~\textbf{reject}~} \def\rejects{~\textrm{rejects}~} \def\Halt{\textbf{Halt}~} \def\halt{~\textbf{halt}~} \newcommand{\inner}[1]{\langle #1 \rangle} \def\Tr{\le_\mathrm{T}} \def\pmr{\le_\mathrm{p}} \def\Construct{\textbf{Construct}~} \def\bquote{\text{"}} \def\equote{\text{"}} \def\RD{\mathsf{R}} \def\RE{\mathsf{RE}} \def\coRE{\mathsf{coRE}} \def\DTIME{\mathsf{DTIME}} \def\P{\mathsf{P}} \def\NP{\mathsf{NP}} \def\coNP{\mathsf{coNP}} \def\MAZE{\mathrm{MAZE}} \def\TSP{\mathrm{TSP}} \def\SAT{\mathrm{SAT}} \def\TSAT{\mathrm{3SAT}} \def\VC{\mathrm{VERTEX\text{-}COVER}} \def\SC{\mathrm{SET\text{-}COVER}} \def\HC{\mathrm{HAMCYCLE}} \def\HP{\mathrm{HAMPATH}} \def\IS{\mathrm{INDEPENDENT\text{-}SET}} \def\CLIQUE{\mathrm{CLIQUE}} \def\SSUM{\mathrm{SUBSET\text{-}SUM}} \def\KNAPSACK{\mathrm{KNAPSACK}} \def\MAXCUT{\mathrm{MAX\text{-}CUT}} \def\Ex{\mathbb{E}} \def\Var{\mathrm{Var}} \newcommand{\lrPr}[1]{\Pr\brackets{#1}} \newcommand{\lrEx}[1]{\Ex\brackets{#1}} \newcommand{\lrVar}[1]{\Var\parens{#1}} \newcommand{\lrexp}[1]{\exp\parens{#1}} \def\MSAT{\mathrm{Max}\text{-}\mathrm{E3SAT}} \def\PRIMES{\mathrm{PRIMES}} \def\RP{\mathsf{RP}} \def\coRP{\mathsf{coRP}} \def\BPP{\mathsf{BPP}} \def\ZPP{\mathsf{ZPP}} \def\Xj{X^{(j)}} \def\BQP{\mathsf{BQP}} \def\NPI{\mathsf{NPI}} \newcommand{\expp}[1]{\exp\left(#1\right)}\]

Supplemental: Computability

Applying Rice’s Theorem

Rice’s theorem gives us a third tool for proving the undecidability of a language, in addition to the direct proof we saw for \(\atm\) and Turing reduction from a known undecidable language. Given a language \(L\), we need to do the following to apply Rice’s theorem:

  1. Define a semantic property \(\Prop\), i.e. a set of languages.

  2. Show that \(\lang{\Prop} = L\). In other words, we show that the language

    \[\lang{\Prop} = \{\inner{M} : M \text{ is a program and } L(M) \in \Prop\}\]

    consists of the same set of elements as \(L\).

  3. Show that \(\Prop\) is nontrivial. This requires demonstrating that there is some recognizable language \(A \in \Prop\) and another recognizable language \(B \notin \Prop\).

We can then conclude by Rice’s theorem that \(L\) is undecidable.

Example 189

Define \(\lang{\Sigma^*}\) as follows:

\[\lang{\Sigma^*} = \{\inner{M} : M \text{ is a program that accepts all inputs}\}\]

We use Rice’s theorem to show that \(\lang{\Sigma^*}\) is undecidable. Let \(\Prop\) be the semantic property:

\[\Prop = \{\Sigma^*\}\]

The property contains just a single language. Observe that \(M\) accepts all inputs exactly when \(L(M) = \Sigma^*\). Thus, we can express \(\lang{\Sigma^*}\) as:

\[\lang{\Sigma^*} = \{\inner{M} : M \text{ is a program and } L(M) = \Sigma^*\}\]

This is the exact definition of \(\lang{\Prop}\), so we have \(\lang{\Sigma^*} = \lang{\Prop}\).

We proceed to show that there is a recognizable language in \(\Prop\) and another not in \(\Prop\). \(\Sigma^*\) is a recognizable language in \(\Prop\) – we can recognize \(\Sigma^*\) with a program that accepts all inputs. \(\emptyset\) is a recognizable language not in \(\Prop\) – we can recognize \(\emptyset\) with a program that rejects all inputs. Thus, \(\Prop\) is nontrivial.

Since \(\Prop\) is nontrivial, by Rice’s theorem, \(\lang{\Prop}\) is undecidable. Since \(\lang{\Prop} = \lang{\Sigma^*}\), \(\lang{\Sigma^*}\) is undecidable.

Example 190

Define \(\lang{A376}\) as follows:

\[\lang{A376} = \{\inner{M} : M \text{ is a program that accepts all strings of length less than } 376\}\]

We use Rice’s theorem to show that \(\lang{A376}\) is undecidable. Define \(S_{376}\) to be the set of all strings whose length is less than 376:

\[S_{376} = \{x \in \Sigma^* : \abs{x} < 376\}\]

Then let \(\Prop\) be the semantic property:

\[\Prop = \{L \subseteq \Sigma^*\ : S_{376} \subseteq L\}\]

The property contains all languages that themselves contain all of \(S_{376}\). If \(M\) accepts all inputs of length less than 376, then \(S_{376} \subseteq L(M)\). We thus have:

\[\begin{split}\lang{A376} &= \{\inner{M} : M \text{ is a program that accepts all strings of length less than } 376\}\\ &= \{\inner{M} : M \text{ is a program and } S_{376} \subseteq L(M)\}\\ &= \{\inner{M} : M \text{ is a program and } L(M) \in \Prop\}\\ &= \lang{\Prop}\end{split}\]

We proceed to show that there is a recognizable language in \(\Prop\) and another not in \(\Prop\). \(\Sigma^*\) is a recognizable language in \(\Prop\) – a program that recognizes \(\Sigma^*\) accepts all inputs, including all those of length less than 376. \(\emptyset\) is a recognizable language not in \(\Prop\) – a program that recognizes \(\emptyset\) does not accept any inputs, including those of length less than 376. Thus, \(\Prop\) is nontrivial.

Since \(\Prop\) is nontrivial, by Rice’s theorem, \(\lang{\Prop}\) is undecidable. Since \(\lang{\Prop} = \lang{A376}\), \(\lang{A376}\) is undecidable.

Example 191

Define \(\lang{DEC}\) as follows:

\[\lang{DEC} = \{\inner{M} : M \text{ is a program and } L(M) \text{ is decidable}\}\]

We use Rice’s theorem to show that \(\lang{DEC}\) is undecidable. Let \(\Prop\) be the semantic property:

\[\Prop = \{L \subseteq \Sigma^*\ : L \text{ is decidable}\}\]

The property contains all decidable languages, and \(\lang{DEC} = \lang{\Prop}\).

We proceed to show that there is a recognizable language in \(\Prop\) and another not in \(\Prop\). \(\Sigma^*\) is a recognizable language in \(\Prop\) – we can both decide and recognize \(\Sigma^*\) with a program that accepts all inputs. \(\atm\) is a recognizable language not in \(\Prop\) – we know that \(\atm\) is undecidable, and it is recognized by the universal Turing machine \(U\). Thus, \(\Prop\) is nontrivial.

Since \(\Prop\) is nontrivial, by Rice’s theorem, \(\lang{\Prop}\) is undecidable. Since \(\lang{\Prop} = \lang{DEC}\), \(\lang{DEC}\) is undecidable.

Rice’s theorem is not applicable to all undecidable languages. Some examples of languages where Rice’s theorem cannot be applied are:

  • The language

    \[\lang{REJ} = \{(\inner{M}, x) : M \text{ is a program and $M$ rejects $x$}\}\]

    does not consist of source codes of programs on their own, so we cannot define a property \(\Prop\) such that \(\lang{\Prop} = \lang{REJ}\). We must show undecidability either directly or through a Turing reduction.

  • The language

    \[\lang{SmallTM} = \{\inner{M} : M \text{ has fewer than 100 states}\}\]

    is concerned with the structure of \(M\) rather than its language \(L(M)\). This is a syntactic property rather than a semantic property, and it is decidable by examining the source code of \(M\).

  • The languages

    \[\begin{split}\lang{R376} &= \{\inner{M} : M \text{ is a program that rejects all strings of length less than } 376\}\\ \lang{L376} &= \{\inner{M} : M \text{ is a program that loops on all strings of length less than } 376\}\end{split}\]

    do not have a direct match with a semantic property. In both cases, \(L(M) \cap S_{376} = \emptyset\), but the two languages \(\lang{R376}\) and \(\lang{L376}\) are clearly distinct and non-overlapping. In fact, a program \(M_1\) that rejects all strings is in \(\lang{R376}\) while a program \(M_2\) that loops on all strings is in \(\lang{L376}\), but both programs have the same language \(L(M_1) = L(M_2) = \emptyset\). Thus, both \(\lang{R376}\) and \(\lang{L376}\) contain some programs and exclude others with the same language, so it is impossible to define a property \(\Prop\) such that \(\lang{\Prop} = \lang{R376}\) or \(\lang{\Prop} = \lang{L376}\).

    However, both \(\lang{R376}\) and \(\lang{L376}\) are actually undecidable. While we cannot apply Rice’s theorem, we can show undecidability via a Turing reduction.

Thus, Rice’s theorem allows us to take a shortcut for languages of the right format, but we still need the previous tools we saw for languages that do not follow the structure required by the theorem.

Computable Functions and Kolmogorov Complexity

Previously, we only considered decision problems, for which the answer is either yes or no. We formalized such a problem as a language \(L\) over an alphabet \(\Sigma\), where \(L\) is the set of all yes instances. Then solving a decision problem means being able to determine whether \(x \in L\) for all inputs \(x \in \Sigma^*\).

We can define a function \(f_L : \Sigma^* \to \{0, 1\}\) corresponding to the language \(L\):

\[\begin{split}f_L(x) = \begin{cases} 1 &\mbox{ if $x \in L$}\\ 0 &\mbox{ if $x \notin L$} \end{cases}\end{split}\]

A program that decides \(L\) also computes the function \(f_L\), meaning that it determines the value \(f_L(x)\) for any input \(x\). For an arbitrary function \(f : \Sigma^* \to \Sigma^*\), we say that \(f\) is computable if there exists a program that outputs \(f(x)\) given the input \(x\).

Computability is a generalization of decidability for arbitrary functions. Decidability only applies to functions whose codomain is \(\{0, 1\}\), while computability applies to any function whose codomain is comprised of finite-length strings.

In the Turing-machine model, we represent a result in \(\{0, 1\}\) with specific reject and accept states. For a function whose codomain is \(\Sigma^*\), we cannot define a new state for each element in \(\Sigma^*\) – there are infinitely many elements in \(\Sigma^*\), but the set of states \(Q\) for a Turing machine must be finite. Instead, we consider a Turing machine to output the string \(f(x)\) if it reaches a final state with \(f(x)\) written on its tape.

For a concrete programming model, we rely on whatever convention is used in that model for output, such as returning a value from a function or writing something to a standard output stream. In pseudocode, we typically just state “output \(w\)” or “return \(w\)” as an abstraction of outputting the value \(w\).

Given any alphabet \(\Sigma\), there are uncountably many functions \(f : \Sigma^* \to \Sigma^*\). Since a Turing machine \(M\) can compute at most one function and the set of Turing machines is countable, there are uncountably many uncomputable functions over any alphabet \(\Sigma\).

Exercise 192

Use diagonalization to show that the set of functions \(f : \Sigma^* \to \Sigma^*\) is uncountable when \(\Sigma\) is the unary alphabet \(\Sigma = \{1\}\).

As a specific example of an uncomputable function, consider the problem of data compression. Given a string \(w\), we would like to encode its data as a shorter string \(w_c\) that still allows us to recover the original string \(w\). This is known as lossless compression, and many algorithms for lossless compression are in use including the GIF format for images, ZIP and GZIP for arbitrary files, and so on. (There are lossy formats as well such as JPEG and MP3, where some information is lost as part of the compression process.) It can be demonstrated by a counting argument that any lossless compression algorithm has uncompressible strings for all lengths \(n \ge 1\), where the result of compressing the string is no shorter than the original string.

Exercise 193

Let \(C\) be an arbitrary compression algorithm for binary strings. Define \(C(w)\) to be the result of compressing the string \(w\), where \(w \in \{0, 1\}^*\) and \(C(w) \in \{0, 1\}^*\).

  1. Prove that for all \(n \ge 1\), there are at most \(2^n - 1\) binary strings \(w\) such that the corresponding compressed string \(C(w)\) has length strictly less than \(n\).

  2. Conclude that for all \(n \ge 1\), there exists some uncompressible string \(w\) of length \(n\) such that \(C(w)\) has length at least \(n\).

We specifically consider “compression by program,” where we compress a string \(w\) by writing a program that, given an empty input, outputs the string \(w\). Assume we are working in a concrete programming language \(U\). We define the Kolmogorov complexity of \(w\) in language \(U\) as the length of the shortest program in \(U\) that outputs \(w\), given an empty string. We denote this by \(K_U(w)\), and we have:

\[K_U(w) = \min\{\abs{\inner{M}} : \text{$M$ is a program in language $U$ that outputs $w$, given an empty input}\}\]

Observe that \(K_U(w) = \O(\abs{w})\) for any \(w \in \{0, 1\}^*\) – we can just hardcode the string \(w\) in the program itself. For instance, the following is a C++ program that outputs \(w = 0101100101\):

#include <iostream>

int main() {
  std::cout << "0101100101";
}

To output a different value \(w'\), we need only swap out the hardcoded string \(w\) for \(w'\). The total length of the program that has a string hardcoded is a constant number of symbols, plus the length of the desired output string itself. Thus, there exists a program of length \(\O(\abs{w})\) that outputs \(w\), so the shortest program that does so has length no more than \(\O(\abs{w})\), and \(K_\text{C++}(w) = \O(\abs{w})\). The same is true for any other programming language \(U\).

For some strings \(w\), we can define a much shorter program that outputs \(w\). Consider \(w = 0^m\) for some large number \(m\), meaning that \(w\) consists of \(m\) zeros. We can define a short program to output \(w\) as follows:

#include <iostream>

int main() {
  for (int i = 0; i < m; ++i)
    std::cout << "0";
}

As a concrete example, let \(m = 1000\). The program that hardcodes \(w = 0^m\) would have a length on the order of 1000. But the program that follows the looping pattern above is:

#include <iostream>

int main() {
  for (int i = 0; i < 1000; ++i)
    std::cout << "0";
}

Here, we hardcode \(m\) rather than \(0^m\). The former has length \(\O(\log m)\) (for \(m = 1000\), four digits in decimal or ten in binary), as opposed to the latter that has length \(m\). Thus, we achieve a significant compression for \(w = 0^m\), and we have \(K_\text{C++}(0^m) = \O(\log m)\).

We now demonstrate that the function \(K_U(w)\) is uncomputable for any programming language \(U\). Assume for the purposes of contradiction that \(K_U(w)\) is computable. For a given length \(n\), we define the program \(Q_n\) as follows:

\[\begin{split}&\Algorithm Q_n:\\ &~~~\Forallt x \in \{0, 1\}^n:\\ &~~~~~~\Compute K_U(x)\\ &~~~~~~\If K_U(x) \ge n, \output x \andt \textbf{halt}\\\end{split}\]

As mentioned previously, any compression algorithm has uncompressible strings for every length \(n \ge 1\). Thus, \(Q_n\) will find a string \(w_n\) such that \(K_U(w_n) \ge n\), output it, and halt.

Since \(Q_n\) outputs \(w_n\) given an empty input, by definition, we have \(K_U(w_n) \le \abs{Q_n}\); that is, the Kolmogorov complexity of \(w_n\) in language \(U\) is no more than the length of \(Q_n\), since \(Q_n\) itself is a program that outputs \(w_n\). How long is \(Q_n\)? The only part of \(Q_n\) that depends on \(n\) is the hardcoded value \(n\) in the loop that iterates over \(\{0, 1\}^n\). As we saw before, it takes only \(\O(\log n)\) symbols to encode a value \(n\). Thus, we have \(\abs{Q_n} = \O(\log n)\).

We have arrived at a contradiction: we have demonstrated that \(K_U(w_n) \ge n\) and also that \(K_U(w_n) \le \abs{Q_n} = \O(\log n)\). These two conditions cannot be simultaneously satisfied. Since we have reached a contradiction, our original assumption that \(K_U(w)\) is computable must be false, and \(K_U(w)\) is an uncomputable function.