Computability
Introduction to Computability
We now turn our attention to fundamental questions about computation. While we have seen several algorithmic approaches to solving problems, when faced with a new problem, the first question we should consider is whether it is solvable at all. Otherwise, we might waste a lot of time on a fruitless effort to find an algorithm where none exists!
Before we can reason about what problems are solvable on a computer, we first need to define, in a general and abstract way, what a problem is, and what a computer (or algorithm) is. Rather than basing our answers on specific hardware and software technologies, programming languages, and implementation details (which go in and out of fashion over time), we want to develop simple, fundamental abstractions that are capable of modeling all kinds of computations. These abstractions will be easier to reason about than real computers and programs. Moreover, our abstractions will ideally be strong enough to capture all possible implementations of “computation”, so that results in our model will hold for all real computers, now and into the future.
Formal Languages
We start by defining a unifying abstraction to represent computational problems. The simplest kind of output a problem can have is a yes-or-no answer, like in this question:
Is there a flight from Detroit to New York for less than $100?
We can generalize the question above into a predicate that takes an input value:
Is there a flight from Detroit to \(x\) for less than $100?
For any particular destination \(x\), the answer to this question is still either yes or no. We can further generalize this predicate to work with multiple input values:
Is there a flight from \(x\) to \(y\) for less than \(z\)?
While this predicate is expressed as taking three inputs \(x\), \(y\), and \(z\), we can equivalently consider it to be a predicate that takes a single input tuple \(w = (x, y, z)\) of those three components. Thus, we lose no generality in restricting to predicates that have a single input.
Another name for a predicate (which we will use more often) is a decision problem: given an input, the answer is simply a yes-or-no decision.
A predicate is applicable to some universe of inputs, such as all airports or cities, for instance. (In programming languages, a type is used to represent a universe of values.) The predicate is true (has a “yes” answer) for some subset of the inputs, and is false (a “no” answer) for all the other inputs. We call the set of “yes” instances the language of the predicate. The following is the language defined by our second predicate above:
Therefore, we have recast the question “is there a flight from Detroit to \(x\) for less than $100?” as the decision problem “is \(x \in L_{\text{getaways}}\)?” Any predicate can be recast in such a way, as the decision problem of determining whether the input is in the corresponding language, and vice-versa.
The complement language is the subset of inputs that are not in the language, i.e., those for which the answer is “no”. We represent the complement using an overbar:
So far, we have unified computational problems under the abstraction of languages and their decision problems, of determining whether the input is in the language. Now we also unify the notion of “input” under a common abstraction: as a string over an alphabet.
To represent inputs, we first fix an alphabet, which we often denote by the Greek letter capital-Sigma, written \(\Sigma\). (Warning: this is the same Greek letter used to denote summation, but in this context it has a completely different meaning. [1])
An alphabet is some nonempty, finite set of symbols.
As a few examples:
\(\Sigma_{\text{binary}} = \set{0, 1}\) is the alphabet consisting of just the symbols \(0\) and \(1\).
\(\Sigma_{\text{lowercase}} = \set{a, b, \ldots, z}\) is the alphabet consisting of lowercase English letters.
\(\Sigma_{\text{ASCII}}\) can be defined as the alphabet consisting of all the symbols in the ASCII character set.
\(\Sigma_{\text{Greek}} = \set{\alpha, \beta, \gamma, \ldots, \omega}\) is the alphabet consisting of the lower-case Greek letters.
\(\Sigma_{\text{BigTen}}\) can be defined as the alphabet consisting of all the schools in the Big Ten conference: Michigan, Indiana, Wisconsin, Ohio State, etc. (To depict them visually, we might use their logos.)
We emphasize that an alphabet can be any nonempty set of elements of any kind, as long as it is finite. So, for example, the set of natural numbers \(\N = \set{0, 1, 2, \ldots}\) is not a valid alphabet.
The binary alphabet \(\Sigma = \set{0,1}\) is often taken as the “default” choice, both because it is the “simplest” nontrivial alphabet, and modern computers operate on 0s and 1s at the lowest level. But fundamentally, we can use whatever alphabet we find convenient for our specific purpose.
A string over an alphabet \(\Sigma\) is a finite, ordered sequence of symbols from \(\Sigma\).
We stress that a string must have finite length. (Below we sometimes preface the term “string” by “(finite-length)” for emphasis, but this is redundant and does not change the meaning.)
For example, a binary string consists of any finite sequence of 0s and 1s (the symbols of \(\Sigma_{\text{binary}} = \set{0,1}\)), while every (lower-case) English word is a sequence of letters from the alphabet \(\Sigma_{\text{lowercase}} = \set{a, b, \ldots, z}\). (Not every string over this alphabet is a real English word, however.)
The concept of a string appears in most programming languages, which
usually specify a “string” data type for sequences of elements over a
certain character set (e.g., ASCII or UTF); the character set is the
string’s alphabet. Programming languages often specify notation for
string literals, such as enclosing a sequence of characters within
matching double quotes. This notation itself is not part of the string
data; for instance, the string literal "abcd"
in C++ and Python
represents the four-character [2] sequence \(abcd\), and the quotation
marks are not part of this sequence.
In C++, the character-array string representation also includes a null terminator to indicate the end of the string, but that is an implementation detail. Other implementations do not use this representation.
Note that the empty sequence of symbols is a valid string, over any
alphabet. For instance, ""
represents an empty string in C++ and
Python. In computability, the standard notation for the empty string
is the symbol \(\varepsilon\) (called a varepsilon in some formatting
systems like LaTeX).
We typically display a nonempty string simply by writing out its symbols in order. For instance, \(1011\) is a string over the alphabet \(\Sigma_{\text{binary}}\), consisting of the symbol \(1\), followed by the symbol \(0\), etc. The length of a string \(w\) is the number of symbols in \(w\), and is denoted \(\abs{w}\); for example, \(\abs{1011} = 4\).
To indicate the set of all strings of a certain length over an alphabet, we use “product” and “power” notation for sets. Recall that the set product \(A \times B\) is the set of all pairs where the first component is an element of \(A\) and the second component is an element of \(B\). So, for \(\Sigma = \set{0,1}\), its product with itself is \(\Sigma \times \Sigma = \set{00, 01, 10, 11}\), and similarly for products of more copies of \(\Sigma\). For a set \(A\) and a non-negative integer \(k\), the notation \(A^k\) means the set product of \(k\) copies of \(A\). For example:
In general, then, \(\Sigma^k\) is the set of length-\(k\) strings over \(\Sigma\).
The notation \(\Sigma^*\) denotes the set of all (finite-length) strings over the alphabet \(\Sigma\). Here, the “star” superscript \({}^*\) is known as the Kleene star operator, whose formal definition is
The above definition says that \(\Sigma^*\) is made up of all strings of length 0 over \(\Sigma\), all strings of length 1 over \(\Sigma\), all strings of length 2, and so on, over all finite lengths \(k\). For example,
is the set of all (finite-length) binary strings. It is very important to understand that while \(\Sigma^*\) is an infinite set (for any alphabet \(\Sigma\))—i.e., it has infinitely many elements (strings)—each individual string in \(\Sigma^*\) has some finite length. This is analogous to the natural numbers: there are infinitely many of them, but each one has some finite value (there is no natural number “infinity”).
A language \(L\) over an alphabet \(\Sigma\) is a subset of \(\Sigma^*\), i.e., \(L \subseteq \Sigma^*\). In other words, it is a set of (finite-length) strings over \(\Sigma\).
The following are examples of languages:
\(L_1 = \set{11y : y \in \set{0,1}^*}\) is the language over \(\Sigma_{\text{binary}}\) consisting of all binary strings that begin with two ones.
\(L_2 = \set{hello, world}\) is the language over \(\Sigma_{\text{lowercase}}\) consisting of just the two strings \(hello\) and \(world\).
The empty language (empty set) \(\emptyset\) is a language over any alphabet \(\Sigma\), which consists of no strings.
It is important to understand that the empty language \(\emptyset\) is different from the empty string \(\varepsilon\): the former is a set that has no elements, while the latter is an individual string (ordered sequence) that has no characters. (The difference is akin to that between the values produced by
set()
andstr()
with no arguments in Python, orstd::set<std::string>{}
andstd::string{}
in C++.) Moreover, the language \(\set{\varepsilon}\) is different still: it is the set that consists of just one string, and that element is the empty string.
As illustrated above, some languages are finite, while others have infinitely many strings.
According to the definition, we can view the words of the English language itself as comprising a formal language over the English alphabet: it a set of strings made up of English characters. (However, note that this view does not capture any of the grammar or meaning of the English language, just its set of words.)
In computing, we often use the binary alphabet \(\set{0, 1}\). Any data value can be represented as a binary string—in fact, real computers store all data in binary, from simple text files to images, audio files, and this very document itself. Thus, the languages we work with generally consist of binary strings. When we express a language such as
what we actually mean can be more formally written as:
We often use angle-brackets notation \(\inner{x}\) to represent the binary encoding of the value \(x\). We can then express the above more concisely as:
However, for notational simplicity we often elide the angle brackets, making the binary encoding of the input implicit.
Now that we have a formal concept of a language (i.e., the “yes” instances of a decision problem), solving a decision problem entails determining whether a particular input is a member of that language. An algorithm \(M\) that solves a problem \(L\) is one that:
takes some arbitrary input \(x\),
performs some computations,
outputs “yes” — in other words, accepts — if \(x \in L\);
outputs “no” — in other words, rejects — if \(x \notin L\).
If this is the case, we say that \(M\) decides \(L\).
Can every language \(L\) be decided? In other words, is it the case that for any language \(L\), there exists some algorithm \(M\) that decides \(L\)? To answer this question, we need a formal definition of what an algorithm really is.
Overview of Automata
In computability theory, an abstract computing device is known as an automaton (plural: automata). There are numerous different abstract models of computation, such as state machines, recursive functions, lambda calculus, von Neumann machines, cellular automata, and so on. We will primarily concern ourselves with the Turing-machine model, though we will first briefly examine a more restricted variant called (deterministic) finite automata.
Both the finite-automata and Turing-machine models are forms of state machines. A machine includes a finite set of states, each representing a discrete status of a computation, and the state transitions the machine follows as it computes. An analogous concept in a C++ or Python program is a line of code (or program counter in a compiled executable): a program has a finite number of lines, execution of the program is at a single line at any point in time, and the program transitions from one line to another as it runs (possibly revisiting lines over time, as in a loop).
In a state machine, state transitions are based on what is read from the input, what is in the machine’s memory (if it has a separate memory), or both. We represent states and transitions graphically as follows:
States are drawn as vertices in the graph, often labeled by names. A directed edge represents a transition, and the label denotes the conditions under which the transition is taken, as well as any side effects (e.g., writing a symbol to memory) of the transition. Mathematically, we represent the states as a finite set, and state transitions as a transition function, with states and other conditions as components of the domain, and states and side effects as components of the codomain.
The primary difference between the two models we’ll discuss (finite automata and Turing machines) is what kind of “memory” the model has access to. This has a significant effect on the computational power of a machine. More generally, the Chomsky hierarchy is a collection of various classes of languages, where each class corresponds to what can be computed on a particular kind of state machine:
Regular languages correspond to the computational power of finite automata (FA in the diagram below), which have no memory beyond their own states.
Context-free languages correspond to pushdown automata (PDA in the diagram below), which have a single “stack” as memory, and can interact only with the top of the stack (pushing to and popping from it).
Context-sensitive languages are computable by linear-bounded automata (LBA in the diagram below), which have access to memory that is linear in size with respect to the input, and which can be accessed in any position.
Recursively-enumerable languages, also called recognizable languages, correspond to the power of Turing machines (TM in the diagram below), which have access to unbounded memory that can be accessed in any position.
Each of these classes of languages is strictly contained within the next, demonstrating the effect that the memory system has on computational power.
We next examine the finite-automata model in more detail, before turning our attention to Turing machines.
Finite Automata
The simplest state-machine model is that of a finite automaton, which consists of a finite number of states and no additional memory. The machine is in exactly one state at a time, and a computational step involves reading a symbol from the input string and transitioning to the next state (which may be the same state as before) based on what symbol is read. This model and slight variations of it are known by many different names, including finite automata, deterministic finite automata (DFA), finite-state automata, finite-state machine, finite acceptor, and others. We will use the term finite automata to refer to the model.
The following is a graphical representation of a finite automaton over the input alphabet \(\Sigma = \set{a,b}\):
This automaton has six states: \(q_0, q_a, q_{ab}, q_{abb}, q_{abba}, q_{\text{other}}\). The state \(q_0\) is the special initial state, which is the state in which computation begins. Graphically, this is denoted by an incoming edge with no label and no originating vertex. This particular automaton also has a single accept state, \(q_{abba}\), depicted with a double circle. If the computation terminates in this state, the machine accepts the input, otherwise the machine rejects it. Finally, transitions between states are depicted as directed edges, with labels corresponding to the input symbol(s) that trigger the transition.
A finite automaton runs on an input string, performing a state transition for each symbol of the string, in sequence. The machine does a single pass over the input, and the computation terminates when (and only when) the entire input has been read. The result is acceptance if the final state is an accept state, and rejection otherwise.
As an example, we trace the execution of the above finite automaton on the input string \(abba\). In the diagrams below, the top-left shows how much of the input has been read, and the current state is represented by a shaded vertex. Initially, the machine is in the start state \(q_0\), and none of the input string has been read yet:
The first input symbol is an \(a\), so the machine makes the transition labeled by an \(a\) from the state \(q_0\), which goes to the state \(q_a\).
The second step reads the symbol \(b\) from the input, so the machine transitions from \(q_a\) to \(q_{ab}\).
The third step reads the symbol \(b\), and the corresponding transition is from the state \(q_{ab}\) to \(q_{abb}\).
The fourth step reads an \(a\), causing a transition from \(q_{abb}\) to \(q_{abba}\).
The entire input has now been read, so the computation is complete. The machine terminates in the state \(q_{abba}\); since this is an accepting state, the machine has accepted the input \(abba\).
The following is an example of running the machine on input \(aba\):
As before, the first symbol is an \(a\), so the machine transitions from the start state \(q_0\) to the state \(q_a\).
The second symbol is again a \(b\), so the machine goes from state \(q_a\) to \(q_{ab}\).
The third step reads the symbol \(a\), so the machine transitions from \(q_{ab}\) to \(q_{\text{other}}\).
The machine has processed the entire input, so the computation is complete and the machine terminates. Since the final state \(q_{\text{other}}\) is not an accepting state, the machine has rejected the input \(aba\).
For this automaton, the only string of input symbols that leads to termination in an accepting state is \(abba\), so this finite automaton decides the language
consisting of only the string \(abba\). Note that an input like \(abbab\) causes the automaton to pass through the accepting state \(q_{abba}\) during the computation, but it ultimately terminates in the non-accept state \(q_{\text{other}}\), so the machine rejects the input \(abbab\).
Formal Definition
A finite automaton is a five-tuple
where:
\(\Sigma\) is the (finite) input alphabet; an input to the automaton is any (finite) string over this alphabet (i.e., an element of \(\Sigma^*\)).
\(Q\) is the finite set of the automaton’s states.
\(q_0 \in Q\) is the initial state.
\(F \subseteq Q\) is the subset of accepting states. If, after reading the entire input string, the machine terminates in one of these states, it is said to accept the string; otherwise, it is said to reject the string.
\(\delta\) is the transition function, which maps a state and input symbol to a next state:
\[\delta \colon Q \times \Sigma \to Q \; \text{.}\](Recall that \(Q \times \Sigma\) is the set of all pairs whose first component is a state in \(Q\) and whose second component is a symbol in \(\Sigma\).) This function defines the automaton’s transitions as follows: if it is in state \(q \in Q\) and reads the next input symbol \(\sigma \in \Sigma\), then it transitions to the next state \(\delta(q,\sigma) \in Q\).
We emphasize that \(\delta\) must be a function, i.e., for every state-symbol pair, there must be exactly one next state.
We often depict the states visually as vertices of a graph, and we depict the state transitions as directed edges from states to new states, labeled by the corresponding symbols. Specifically, if \(\delta(q,\sigma) = q'\), then we have an edge from \(q\) to \(q'\), labeled by \(\sigma\). (If there are multiple transitions from \(q\) to \(q'\) for different symbols, we usually use the same edge for all of them, and label it by a comma-separated list of all the corresponding symbols.) Because \(\delta\) is a function, this means that for each state, there must be exactly one outgoing edge for each alphabet symbol.
Let us now see the formal definition of the example automaton given above.
The input alphabet is \(\Sigma = \set{a,b}\).
The set of states is
\[Q = \set{q_0, q_a, q_{ab}, q_{abb}, q_{abba}, q_{\text{other}}} \; \text{.}\]The initial state is the one we named \(q_0\).
The subset \(F\) of accepting states consists of the single state \(q_{abba}\), i.e., \(F = \set{q_{abba}}\).
The transition function can be specified in list form, as:
\[\begin{split}\delta(q_0, a) &= q_a\\ \delta(q_0, b) &= q_{\text{other}}\\ \delta(q_a, a) &= q_{\text{other}}\\ \delta(q_a, b) &= q_{ab}\\ \delta(q_{ab}, a) &= q_{\text{other}}\\ \delta(q_{ab}, b) &= q_{abb}\\ \delta(q_{abb}, a) &= q_{abba}\\ \delta(q_{abb}, b) &= q_{\text{other}}\\ \delta(q_{abba}, a) &= q_{\text{other}}\\ \delta(q_{abba}, b) &= q_{\text{other}}\\ \delta(q_{\text{other}}, a) &= q_{\text{other}}\\ \delta(q_{\text{other}}, b) &= q_{\text{other}} \; \text.\end{split}\]Alternatively, it can be given in tabular form:
old state \(q\)
\(\delta(q,a)\)
\(\delta(q,b)\)
\(q_0\)
\(q_a\)
\(q_{\text{other}}\)
\(q_a\)
\(q_{\text{other}}\)
\(q_{ab}\)
\(q_{ab}\)
\(q_{\text{other}}\)
\(q_{abb}\)
\(q_{abb}\)
\(q_{abba}\)
\(q_{\text{other}}\)
\(q_{abba}\)
\(q_{\text{other}}\)
\(q_{\text{other}}\)
\(q_{\text{other}}\)
\(q_{\text{other}}\)
\(q_{\text{other}}\)
In either case, notice that for each state-symbol pair, there is exactly one corresponding next state.
The language of a finite automaton \(M\) is the set of strings that the automaton accepts: \(L(M) = \set{x \in \Sigma^* : M \accepts x}\). We also say that \(M\) decides this language: it accepts every string in \(L(M)\) and rejects every string not in \(L(M)\).
For the example automaton \(M\) above, we have
Consider the finite automaton \(M = (Q, \Sigma, \delta, q_0, F)\) where:
This machine has three states, of which \(q_1\) is the lone accept state. The following is a graphical representation:
The machine starts in state \(q_0\) and stays there as long as it reads just \(a\) symbols. If it then reads a \(b\), the machine transitions to \(q_1\). Any subsequent symbol moves the machine to \(q_2\), where it stays until it reads the rest of the input. Since \(q_1\) is the only accepting state, the machine accepts only strings that end with a single \(b\). Thus, the language of the machine is
i.e., the set of strings that start with any number of \(a\)s, followed by a single \(b\).
Finite automata have many applications, including software for designing and checking digital circuits, lexical analyzers of most compilers (usually the first phase of compilation, which splits a source file into individual words, or “tokens”), scanning large bodies of text (pattern matching), and reasoning about many kinds of systems that have a finite number of states (e.g., financial transactions, network protocols, and so on).
However, finite automata are not strong enough to model all possible computations. For instance, no finite automaton decides the language
consisting of strings composed of an arbitrary number of \(0\)s followed by the same number of \(1\)s. The basic intuition for why this is the case is that, since a finite automaton does not have any memory beyond its own states, it cannot reliably count how many \(0\)s or \(1\)s it has seen (beyond a certain fixed number), so it will output the incorrect answer on some input strings. See below for a rigorous proof that formalizes this intuition.
Yet we can easily write a program in most any programming language
that decides \(L\). The following is an example in C++. (To be precise,
this program does not quite work, because if the input starts with
enough \(0\)s, the count
variable, which has type int
, will
overflow and lose track of the number of \(0\)s that have been read.
This is exactly the issue with trying to use a finite automaton to
decide \(L\); to overcome it, we would need to use a data type that can
store arbitrarily large integers.)
#include <iostream>
int main() {
int count = 0;
char ch;
while ((std::cin >> ch) && ch == '0') {
++count;
}
if (!std::cin || ch != '1') {
return !std::cin && count == 0;
}
--count; // already read one 1
while ((std::cin >> ch) && ch == '1') {
--count;
}
return !std::cin && count == 0;
}
Thus, we need a more powerful model of computation to characterize the capabilities of real-world programs.
We prove that no DFA decides the language \(L = \set{0^n 1^n : n \in \N}\). Let \(M\) be an arbitrary DFA \(M\); we will show that \(M\) does not decide \(L\) by exhibiting an input on which \(M\) returns the wrong output, i.e., \(M\) rejects a string in \(L\), or accepts a string not in \(L\).
Let \(s\) be the number of states of \(M\), which is finite by definition of a DFA. Consider the execution of \(M\) on an input that begins with \(0^{s+1}\). Since \(M\) has only \(s\) states, by the pigeonhole principle, there must be some state \(q \in Q\) that is repeated while \(M\) processes the string. Specifically, there exist some distinct \(0 \leq i < j \leq s+1\) such that \(M\) is in state \(q\) after having read \(0^i\), and also after having read \(0^j\).
Now, consider running \(M\) on the strings \(0^i 1^i \in L\) and \(0^j 1^i \notin L\) (where the latter holds because \(i \neq j\)). Notice that \(M\) must output the same decision on both of these strings, because it is in the same state \(q\) after reading the initial \(i\) or \(j\) zeros, and because both strings have \(1^i\) after those zeros. However, one of these strings is in \(L\) and the other is not, so \(M\) outputs the wrong decision on one of them, and therefore does not decide \(L\), i.e. \(L(M) \neq L\), as claimed.
The reasoning above is an particular example of applying the pumping lemma. If a language is regular (i.e., can be decided by a DFA), then every sufficiently long string in the language has some section \(w\) that can be repeated arbitrarily, or “pumped”, to produce another string that is also a member of the language. The pumping lemma can be used to prove that a language is non-regular, as we did above for \(L = \set{0^n 1^n : n \in \N}\).
Turing Machines
A Turing machine is a powerful abstract model for a computer, proposed by Alan Turing in 1936. The abstraction was designed to model pencil-and-paper computations by a human or other (possibly mechanical) “brain” that is following some prescribed set of rules. The essential features of the model are as follows:
Each sheet of paper can hold a finite number of symbols from some finite alphabet.
The amount of paper is unlimited – if we run out of space, we can just buy more paper.
The “brain” can look at and modify only one symbol on the paper at a time.
Though there may be an arbitrary amount of data stored on the paper, the “brain” can retain only a fixed amount at any time.
We model these features more specifically as follows, though we emphasize that many of these choices are fairly arbitrary, and can be adjusted without affecting the power of the model. (See Equivalent Models for an example.)
The paper is represented by an infinite tape that is divided into individual cells, each of which holds one symbol at a time (we view “blank” as a symbol of its own). For concreteness, in this text we use a “uni-directionally infinite” tape that has a leftmost cell and continues indefinitely to the right. (One can also consider a “bi-directionally infinite” tape without affecting the strength of the model.)
The tape has a read/write head that is positioned over a single cell at a time, and it can read the contents of the cell, write a new symbol to that cell, and move one position to the left or right.
Finally, the brain is modeled using a kind of finite-state machine: there are a finite set of states, with each state corresponding to what the brain “holds” at a specific time. Except for the states that terminate the machine, each state uses the symbol read by the head to determine what symbol to write (at the same cell), which direction to move the head, and what the next state will be.
The following is a pictorial representation of this model.
We formalize the notion of a Turing machine as follows.
A Turing machine is a seven-tuple
\[M = (\Sigma, \Gamma, Q, \qst, \qacc, \qrej, \delta)\]whose components are as follows:
\(\Sigma\) is the (finite) input alphabet. An input to the machine is any (finite) string over this alphabet (i.e., an element of \(\Sigma^*\)).
\(\Gamma\) is the (finite) tape alphabet, which must contain \(\Sigma\) as a subset (i.e., \(\Sigma \subseteq \Gamma\)), in addition to a special blank symbol \(\bot \in \Gamma\) that is not in the input alphabet (i.e., \(\bot \notin \Sigma\)). It may contain other elements as well. At all times, every symbol on the tape is an element of \(\Gamma\).
\(Q\) is the finite set of states.
\(\qst \in Q\) is the initial state.
\(\qacc, \qrej \in Q\) are the (distinct) accept state and reject state, respectively. Together, they comprise the set of final states \(F = \set{\qacc, \qrej}\).
\(\delta\) is the transition function. Its input is a non-final state and a tape symbol (as read by the head), and its output is a new state, a new tape symbol (to be written by the head), and a direction for the head to move (left or right). Formally, we have
\[\delta \colon (Q \setminus F) \times \Gamma \to Q \times \Gamma \times \set{L, R}\]where \(L\) and \(R\) represent “left” and “right,” respectively. The meanings of the components of the transition function’s domain and codomain are as follows:
We now describe the process by which a Turing machine computes when given an input string, which can be any (finite) string over the input alphabet \(\Sigma\). First, the machine is initialized as follows.
Tape contents: the input string appears written on the tape from left to right, starting at the leftmost cell, with one symbol per cell. Every other cell contains the blank symbol \(\bot\).
The head is positioned at the leftmost cell of the tape.
The initial “active” state is \(\qst\).
The following illustrates the initial configuration of a machine when the input is the binary string \(111010111\):
A computational step consists of an application of the transition function to the current configuration of a machine—i.e., its active state, tape contents, and head position—to produce a new configuration. The machine’s active state must be a non-final state – otherwise, the machine has halted and no further computation is done. To perform a computational step from some non-final active state \(q \in Q \setminus F\), the machine:
reads the symbol \(s \in \Gamma\) at the tape cell at which the head is positioned;
evaluates the transition function to get \((q' \in Q, s' \in \Gamma, D \in \set{L,R}) = \delta(q,s)\);
makes \(q' \in Q\) the new active state;
writes \(s'\) to the cell at which the head is positioned;
moves the head left or right according to the value of \(D\).
We stress that the transition function completely determines what happens in steps 3-5, based entirely on the active state and the symbol that is read by the head. That is, given the current state \(q\) and tape symbol \(s \in \Gamma\), the output \((q', s', d) = \delta(q,s)\) of the transition function gives the new active state \(q'\) (which may be the same as the original state \(q\)), the symbol \(s'\) to write (which may be the same as the original symbol \(s\)), and the direction in which to move the head (left or right). By convention, if the head is at the leftmost cell of the tape and the direction is left, then the head stays in the same (leftmost) position.
A Turing machine computes by repeatedly executing the above computational step, halting only when it transitions to one of the two final states \(\qacc\) or \(\qrej\). If the machine ever reaches the state \(\qacc\), the machine is said to accept the input. If the machine ever reaches \(\qrej\), it is said to reject the input. As we will discuss more later, there is a third possibility: that the machine never reaches a final state at all, and continues applying the computational step indefinitely. In this case, we say that the machine loops forever, or just loops, on the input.
The main differences between finite automata and Turing machines are summarized as follows:
Property |
Finite Automata |
Turing Machines |
---|---|---|
Has an (unbounded) read/write memory (tape) |
No |
Yes |
Can have zero, or more than one, accept state(s) |
Yes |
No |
Has a reject state |
No |
Yes |
Terminates when accept/reject state is reached |
No |
Yes |
Direction input is read |
Right |
Right or Left |
Must terminate when entire input is read |
Yes |
No |
As an example of a Turing machine, here we define a simple machine that accepts any binary string composed entirely of ones (including the empty string), and rejects any string that contains a zero. The components of the seven-tuple are as follows:
\(\Sigma = \set{0,1}\),
\(\Gamma = \set{0, 1, \bot}\),
\(Q = \set{\qst, \qacc, \qrej}\),
the transition function \(\delta \colon \set{\qst} \times \set{0, 1, \bot} \to \set{\qst, \qacc, \qrej} \times \set{0, 1, \bot} \times \set{L, R}\) is specified by listing its output on each input:
\[\begin{split}\delta(\qst, 0) &= (\qrej, 0, R)\\ \delta(\qst, 1) &= (\qst, 1, R)\\ \delta(\qst, \bot) &= (\qacc, \bot, R) \; \text.\end{split}\]We can also specify the transition function in tabular format:
old state
read symbol
new state
written symbol
direction
\(\qst\)
\(0\)
\(\qrej\)
\(0\)
\(R\)
\(\qst\)
\(1\)
\(\qst\)
\(1\)
\(R\)
\(\qst\)
\(\bot\)
\(\qacc\)
\(\bot\)
\(R\)
The machine has just three states: the initial state \(\qst\) and the two final states \(\qacc,\qrej\). In each computational step, it reads the symbol at the tape head, going to the reject state if it is a zero, staying in \(\qst\) if it is a one, and transitioning to the accept state if it is a \(\bot\). In all three cases, the machine leaves the contents of the cell unchanged (it writes the same symbol as it reads) and moves the head to the right, to the next input symbol (or to the first blank that appears after the input string).
We have described the machine above by explicitly writing out its seven-tuple, including its entire transition function. More commonly, we describe a machine using a state diagram instead. Such a diagram is a labeled graph, with a vertex for each state and edges representing transitions between states. The following is an example of a transition:
The edge goes from state \(q_1\) to state \(q_2\), which are the state components of the transition function’s input and output. The remaining components are noted on the edge label. The example diagram above means that \(\delta(q_1, 1) = (q_2, 0, R)\): when the machine is in state \(q_1\) and it reads a \(1\) at the tape head, it transitions to state \(q_2\), writes a \(0\) at the tape head, and moves the head to the right.
We emphasize that, because the state-transition function must be a function on the domain \((Q \setminus F) \times \Gamma\), for each non-final state in the diagram there must be exactly one outgoing edge (a transition) for each symbol in the tape alphabet \(\Gamma\). Otherwise, the state-transition function would either be incomplete (missing an output for some input) or inconsistently defined (having multiple different outputs for the same input). Similarly, there cannot be an outgoing arrow from either of the final states.
The state diagram for the machine we defined above with a seven-tuple is as follows:
Let’s run this machine on the input \(111010111\). The machine starts in the initial state, with the input written on the tape (followed by blanks) and the head in the leftmost position:
The cell at the head holds a \(1\). The transition function (looking at either the seven-tuple or the state diagram) tells us that \(\delta(\qst, 1) = (\qst, 1, R)\), so the machine stays in the same state, writes the symbol \(1\) at the head, and moves the head to the right.
Again, we have a \(1\) at the head location. Again, the machine stays in the same state, leaves the symbol unchanged, and moves the head to the right.
Once again, there is a \(1\) at the head, resulting in the same actions as in the previous two steps.
The machine now has a \(0\) at the head. The transition function has that \(\delta(\qst, 0) = (\qrej, 0, R)\), so the machine transitions to the state \(\qrej\), leaves the \(0\) unchanged, and moves the head to the right.
The machine is now in a final state, namely, \(\qrej\), so the computation halts. Since the machine reached the reject state, it rejects the input \(111010111\).
Had the input been composed entirely of ones, the machine would have stayed in \(\qst\), moving the head one cell to the right in each step. After examining all the input symbols, it would reach a cell that contains the blank symbol \(\bot\). The transition function tells us that the machine would then move to the accept state \(\qacc\) (and leave the blank symbol unchanged, and move the head to the right). Thus, the machine would accept any input consisting entirely of ones. (This includes the empty string, because at startup the tape would consist entirely of blanks, so in the very first computational step the machine would read the blank symbol and transition to the accept state \(\qacc\).)
On any input string, this machine will eventually reach a final state. (For this conclusion we are relying on the fact that by definition, every string has finite length.) In the worst case, it examines each input symbol once, so it runs in linear time with respect to the size of the input. While this machine halts on any input, we will soon see that this is not the case for all machines.
As a second example, consider the following machine:
This machine has the input alphabet \(\Sigma = \set{0,1}\), the tape alphabet \(\Gamma = \set{0, 1, \bot}\), six states \(Q = \{\qst, q_{\text{scan}}, q_{\text{end}}, q_{\text{back}}, \qacc, \qrej\}\), and the following transition function:
old state |
read symbol |
new state |
written symbol |
direction |
\(\qst\) |
\(0\) |
\(q_{\text{scan}}\) |
\(\bot\) |
\(R\) |
\(\qst\) |
\(1\) |
\(\qrej\) |
\(1\) |
\(R\) |
\(\qst\) |
\(\bot\) |
\(\qacc\) |
\(\bot\) |
\(R\) |
\(q_{\text{scan}}\) |
\(0\) |
\(q_{\text{scan}}\) |
\(0\) |
\(R\) |
\(q_{\text{scan}}\) |
\(1\) |
\(q_{\text{scan}}\) |
\(1\) |
\(R\) |
\(q_{\text{scan}}\) |
\(\bot\) |
\(q_{\text{end}}\) |
\(\bot\) |
\(L\) |
\(q_{\text{end}}\) |
\(0\) |
\(\qrej\) |
\(0\) |
\(R\) |
\(q_{\text{end}}\) |
\(1\) |
\(q_{\text{back}}\) |
\(\bot\) |
\(L\) |
\(q_{\text{end}}\) |
\(\bot\) |
\(\qrej\) |
\(\bot\) |
\(R\) |
\(q_{\text{back}}\) |
\(0\) |
\(q_{\text{back}}\) |
\(0\) |
\(L\) |
\(q_{\text{back}}\) |
\(1\) |
\(q_{\text{back}}\) |
\(1\) |
\(L\) |
\(q_{\text{back}}\) |
\(\bot\) |
\(\qst\) |
\(\bot\) |
\(R\) |
Let us trace the execution of this machine on the input \(0011\). The machine starts in the initial state, with the input written on the tape (followed by blanks) and the head in the leftmost position:
The cell at the head has a zero. The transition function tells us that \(\delta(\qst, 0) = (q_{\text{scan}}, \bot, R)\), so the machine transitions to state \(q_{\text{scan}}\), writes a blank symbol at the head, and moves the head to the right.
The cell at the head now has a zero. The transition function tells us that \(\delta(q_{\text{scan}}, 0) = (q_{\text{scan}}, 0, R)\), so the machine stays in \(q_{\text{scan}}\), leaves the symbol \(0\) under the head, and moves the head to the right.
The cell at the head now has a one. The transition function tells us that \(\delta(q_{\text{scan}}, 1) = (q_{\text{scan}}, 1, R)\), so the machine stays in \(q_{\text{scan}}\), leaves the symbol \(1\) under the head, and moves the head to the right.
Again, the cell at the head now has a one, so the machine stays in the same state, leaves the cell unchanged, and moves the head to the right.
Now the cell at the head has a blank symbol, so the transition function tells us that the machine transitions to \(q_{\text{end}}\), leaves the blank in the cell, and moves the head to the left.
The cell at the head has a one, so the transition function tells us that the machine transitions to \(q_{\text{back}}\), writes a blank to the cell, and moves the head to the left.
The cell at the head has a one, so the transition function tells us that the machine stays in \(q_{\text{back}}\), leaves the one in the cell, and moves the head to the left.
The cell at the head now has a zero, so the transition function tells us that the machine stays in \(q_{\text{back}}\), leaves the zero in the cell, and moves the head to the left.
The cell at the head now has a blank, so the transition function tells us that the machine transitions to \(\qst\), leaves the blank in the cell, and moves the head to the right.
The cell at the head now has a zero, so the machine transitions to \(q_{\text{scan}}\), writes a blank to the cell, and moves the head to the right.
The cell at the head now has a one, so the machine stays in \(q_{\text{scan}}\), leaves the one in the cell, and moves the head to the right.
The cell at the head has a blank symbol, so the machine transitions to \(q_{\text{end}}\), leaves the blank in the cell, and moves the head to the left.
The cell at the head has a one, so the machine transitions to \(q_{\text{back}}\), writes a blank to the cell, and moves the head to the left.
The cell at the head has a blank, so the machine transitions to \(\qst\), leaves a blank in the cell, and moves the head to the right.
The cell at the head has a blank, so the machine transitions to \(\qacc\), leaves a blank in the cell, and moves the head to the right.
The machine has reached the accept state \(\qacc\), so it halts and accepts.
By generalizing the above example, we can see that this machine accepts any input string that consists of some nonnegative number of zeros followed by the same number of ones, and rejects otherwise. In other words, it decides the language
Recall that we proved above that this language cannot be decided by a finite automaton. However, we have just seen that it can be decided by a Turing machine.
How powerful is the Turing-machine model? The Church-Turing thesis asserts the following:
A function can be computed by an “effective” or “mechanical” procedure—in other words, an algorithm—if and only if can be computed by a Turing machine.
The terms “effective” and “mechanical” are not precisely and formally defined, so the Church-Turing thesis is not a statement that can be proved or disproved. Instead, it is an assertion that any well-defined procedure consisting of a finite number of precise instructions, which can be executed without any ingenuity or creativity, and which halts after a finite number of steps and produces a correct answer, can be represented by a Turing machine. Alternatively, it can be seen as the definition of what “algorithm” formally means: anything that can be represented by a Turing machine.
Support for the Church-Turing thesis can be found in the fact that many quite different-looking proposed models of computation have been shown to be equivalent to the Turing-machine model: whatever can be computed in such a model can also be computed by a Turing machine, and vice-versa. These include the lambda calculus and its variants, general recursive functions, most widely used computer programming languages, and more. So, despite the simplicity of the Turing-machine model, we have good reason to believe that it captures the essential nature of computation.
The Language of a Turing Machine
Now that we have formal definitions of both languages and Turing machines, which respectively formalize the notions of problems and algorithms (or programs), we connect them together. Recall that a Turing machine \(M\) has an input alphabet \(\Sigma\), and \(\Sigma^*\) is the set of all possible inputs to the machine. When \(M\) is run on an input \(x \in \Sigma^*\), which we often denote with the notation \(M(x)\), there are three possible, mutually exclusive behaviors:
\(M\) accepts \(x\), often written as “\(M(x)\) accepts”: \(M\) eventually halts due to reaching the accept state \(\qacc\);
\(M\) rejects \(x\), often written as “\(M(x)\) rejects”: \(M\) eventually halts due to reaching the reject state \(\qrej\);
\(M\) loops on \(x\), often written as “\(M(x)\) loops”: \(M\) never reaches a final state, and its execution continues forever.
The language \(L(M)\) of a Turing machine \(M\) is the set of strings that the machine accepts:
By definition, every Turing machine \(M\) has a corresponding language \(L(M)\), and \(x \in L(M)\) if and only if \(M(x)\) accepts; thus, \(x \notin L(M)\) if and only if \(M(x)\) rejects or loops.
For example, for the machine above that accepts all strings (and only those strings) that consist entirely of ones, we have that \(L(M) = \set{1}^* = \set{\varepsilon, 1, 11, 111, \ldots}\).
Some Turing machines halt (i.e., accept or reject) on every input; they do not loop on any input. Such a machine is called a decider, and it is said to decide its language.
A Turing machine \(M\) decides a language \(L \subseteq \Sigma^*\) if:
\(M\) accepts every \(x \in L\), and
\(M\) rejects every \(x \notin L\).
Equivalently:
\(M\) halts on every \(x \in \Sigma^*\), and
\(x \in L\) if and only if \(M\) accepts \(x\).
A language is decidable if some Turing machine decides it.
The equivalence between the two pairs of conditions can be seen as follows: the first pair of properties immediately implies the latter pair. For the latter pair, the first property implies that on every input, \(M\) either accepts or rejects, and then the second property implies that \(M\) accepts every \(x \in L\) and rejects every \(x \notin L\), as needed.
In general, a Turing machine might not decide any language, because it might loop on one or more inputs. But if \(M\) is a decider—i.e., it does not loop on any input—then \(M\) decides its language \(L(M)\), and does not decide any other language. In other words, a particular Turing machine decides at most one language.
We also briefly mention a relaxation of the notion of deciding, called recognizing.
A Turing machine \(M\) recognizes a language \(L \subseteq \Sigma^*\) if:
\(M\) accepts every \(x \in L\), and
\(M\) rejects or loops on every \(x \notin L\).
Equivalently: \(x \in L\) if and only if \(M\) accepts \(x\).
A language is recognizable if some Turing machine recognizes it.
In comparison to deciding (Definition 61), here \(M\) must still accept every string in the language, but it is not required to halt on every input: it may loop on any string not in the language (indeed, it may loop on all such strings!). So, if a machine decides a language, it also recognizes that language, but not necessarily vice-versa. Observe that, by definition, every Turing machine \(M\) recognizes exactly one language, namely, the machine’s language \(L(M)\). See Recognizability for further details and results.
Altogether, we now have the following formalizations:
A language is the formalization of a decision problem.
A Turing machine is the formalization of a program or algorithm.
A machine deciding a language is the formalization of a program solving a decision problem.
Thus, our original question about whether a given problem is solvable by a computer is the same as asking whether its corresponding language is decidable.
Decidable Languages
By definition, a language is decidable if some Turing machine decides it, i.e., the machine accepts every string in the language and rejects every string not in the language. Does this mean that, to demonstrate that a language is decidable, we must formally define the seven-tuple of such a Turing machine? Fortunately, no. Because all known computer programming languages, including valid pseudocode, can be simulated by a Turing machine (see the Church-Turing thesis), we can write an algorithm in such a language and be assured that there is a corresponding Turing machine. As an example, consider the following language:
An algorithm to decide this language is as follows:
We now analyze this algorithm to show that it decides \(\lang{GCD}\), according to Definition 61:
If \((a, b) \in \lang{GCD}\), then \(\gcd(a, b) = 1\) by definition, so the call to Euclid returns 1, hence DecideGCD accepts \((a, b)\).
If \((a, b) \notin \lang{GCD}\), then \(\gcd(a, b) \neq 1\), so the call to Euclid returns a value not equal to 1, hence DecideGCD rejects \((a, b)\).
Alternatively, we can analyze the algorithm as follows: first, it halts on any input, because Euclid does (recall that Euclid runs in \(\O(\log(a+b))\) iterations, in the worst case). Second, \((a,b) \in \lang{GCD}\) if and only if \(\gcd(a,b) = 1\), which holds if and only if DecideGCD\((a,b)\) accepts, by the correctness of Euclid and direct inspection of the pseudocode of DecideGCD.
Thus, DecideGCD decides \(\lang{GCD}\), so \(\lang{GCD}\) is a decidable language.
In general, to demonstrate that a particular language is decidable, we first write an algorithm, and then analyze the algorithm to show that it decides the language, according to Definition 61. Depending on how the algorithm is written, it may be more convenient to establish one pair of properties or the other from Definition 61. As in the above example, we will often show both ‘styles’ of analysis, although only one is needed for a particular proof.
Here we show some examples of how performing certain operations on decidable languages preserves decidability.
Let \(L\) be any decidable language. Then \(L' = L \cup \set{\varepsilon}\) is also decidable.
By definition, since \(L\) is decidable, there exists some Turing machine \(D\) that decides \(L\). We use it construct another machine \(D'\) that decides \(L'\) (more precisely, because \(D\) exists, the following machine also exists):
We analyze the behavior of \(D'\) on an arbitrary input string \(x\), as follows:
If \(x \in L' = L \cup \set{\varepsilon}\), then either \(x = \varepsilon\) or \(x \in L\) (or both). In the first case, \(D'(x)\) accepts by the first line of pseudocode. In the second case, \(D(x)\) accepts because \(D\) decides \(L\). So, in either case, \(D'(x)\) accepts, as needed.
If \(x \notin L' = L \cup \set{\varepsilon}\), then \(x \neq \varepsilon\) and \(x \notin L\). Therefore, the first line of pseudocode does not accept, and then on the second line \(D(x)\) rejects, so \(D'(x)\) rejects, as needed.
So, by Definition 61, \(D'\) does indeed decide \(L'\), hence \(L'\) is decidable.
Alternatively, we can use the other pair of conditions in Definition 61 to analyze \(D'\) as follows. First, \(D'\) halts on every input, because its first line does not loop, and on the second line \(D\) does not loop because it is a decider. Next,
We emphasize that for this second style of analysis, it is very important that all the implications hold in both “directions” (i.e., be “if and only if” statements), which needs to be carefully checked.
For any decidable languages \(L_1\) and \(L_2\), their union \(L = L_1 \cup L_2 = \set{x : x \in L_1 \ort x \in L_2}\) is also decidable.
Since \(L_1\) and \(L_2\) are decidable, there exist machines \(M_1\) and \(M_2\), respectively, that decide them. That is, \(M_1\) accepts every string in \(L_1\) and rejects every string not in \(L_1\), and similarly for \(M_2\) and \(L_2\). We use them to construct a new machine \(M\) that decides \(L\) (again, because \(M_1,M_2\) exist, so does the following machine \(M\)):
We analyze the behavior of \(M\) on an arbitrary input string \(x\), as follows:
If \(x \in L = L_1 \cup L_2\), then \(x \in L_1\) or \(x \in L_2\) (or both). If the former case, \(M_1(x)\) accepts and hence \(M(x)\) accepts on its first line, and in the latter case, \(M_2(x)\) accepts and hence \(M(x)\) accepts on its second line. Either way, \(M(x)\) accepts, as needed.
If \(x \notin L\), then \(x \notin L_1\) and \(x \notin L_2\). So, both \(M_1(x)\) and \(M_2(x)\) reject, and hence \(M(x)\) reaches it final line and rejects, as needed.
So, by Definition 61, \(M\) does indeed decide \(L\), hence \(L\) is decidable.
Alternatively, we can use the other pair of conditions in Definition 61 to analyze \(M\) as follows. First, \(M\) halts on every input, because its calls to \(M_1\) and \(M_2\) halt (since they are deciders). Next, by the properties of \(M_1,M_2\) and the definition of \(M\),
where the last line holds by inspection of the code of \(M\).
Example 65 demonstrates that the class of decidable languages is closed under union: if we take the union of any two members of the class, the result is also a member of the class, i.e., it is a decidable language.
Show that the class of decidable languages is closed under:
intersection (i.e., if \(L_1\) and \(L_2\) are decidable, then so is \(L_1 \cap L_2\));
complement (i.e., if \(L\) is decidable, then so is \(\overline{L}\)).
Is every language decidable (by a Turing machine)? In other words, can every decision problem be solved by some algorithm? We will consider this question below starting with the section on Diagonalization, following a digression on alternative but equivalent models of Turing machines.
Equivalent Models
As an illustration of the power of the simple “one-tape” Turing-machine model defined above, we will demonstrate that an extended model, that of the two-tape Turing machine, is actually no more powerful than the original one-tape model. Similar equivalences can be demonstrated for other natural variations of the Turing-machine model, such as one with a two-way infinite tape, one with a two-dimensional (or any finite-dimensional) tape, one with (a finite number of) multiple heads, one with nondeterministic operations, and combinations thereof.
Similar to the one-tape version, the two-tape model consists of a seven-tuple:
The components \(\Sigma\), \(\Gamma\), \(Q\), \(\qst\), \(\qacc\), and \(\qrej\) are exactly as in the one-tape model. The transition function \(\delta\), however, now takes two tape symbols as input, and outputs two tape symbols and two directions—in all cases, one for each tape head. Formally, we have
(Recall that the notation \(\Gamma^2\) is shorthand for \(\Gamma \times \Gamma\).) In more detail, the components of the transition function are as follows:
The initial state of the machine has the input written on the first tape, followed by blank symbols, and only blank symbols on the second tape. Both heads begin at the leftmost cells of their respective tapes.
In each step, the machine is in some state \(q\) and reads one symbol from each tape, say, \(\gamma_1\) and \(\gamma_2\). The transition function maps \((q, \gamma_1, \gamma_2)\) to the next state, symbols to write on each tape, and directions to move each head:
The machine transitions to state \(q'\), writes \(\gamma_1'\) to the first tape and moves its head in the direction \(d_1\), and writes \(\gamma_2'\) to the second tape and moves its head in the direction \(d_2\).
We now show that the one-tape and two-tape models are equivalent, by proving that anything that can be computed in one model can also be computed by the other. In other words, for every one-tape Turing machine, there is a corresponding two-tape machine that, for each input, has exactly the same output behavior (accept, reject, or loop); and likewise, for every two-tape machine, there is a corresponding one-tape machine.
The first direction is conceptually trivial: for any one-tape Turing machine, we can construct an equivalent two-tape machine that just ignores its second tape. Formally, the two-tape machine has the same tape and input alphabets \(\Sigma\) and \(\Gamma\), the same set of states \(Q\), and the same start and final states \(\qst\), \(\qacc\), and \(\qrej\). The transition function \(\delta_2\) is slightly more subtle. For each entry
of the one-tape machine, the two-tape machine’s transition function has the corresponding entries
for every \(\gamma_2 \in \Gamma\). In other words, in each computational step, the two-tape machine ignores the contents of its second tape and moves its second head to the right, and does whatever the one-tape machine does with the first tape. (Note that even though the second tape consists of all blanks, to have a well-defined transition function \(\delta_2\) we must define the function for all pairs of tape symbols, which is why we need to consider every \(\gamma_2 \in \Gamma\).) Since the machine’s behavior is solely dependent on the contents of the first tape, the two-tape machine makes exactly the same transitions as the one-tape machine on a given input, and it ends up accepting, rejecting, or looping on each input exactly as the one-tape machine does.
The other direction of the equivalent is significantly more complicated: for an arbitrary two-tape machine, we need to construct a one-tape machine that produces the same output behavior on any given input. There are many ways to do this, and we describe one such construction at a high level.
First, we consider how a one-tape machine can store the contents of two tapes, as well as two head positions. A key observation is that any point during the two-tape machine’s computation, the amount of actual data (i.e., excluding trailing blank cells) on the two tapes is finite. Thus, we can store that data on a single tape just as well. We use the following format:
We use a special \(\#\) symbol (which we include in the tape alphabet of the one-tape machine) to separate the (finite) contents of each tape, and to indicate the endpoints of those contents. Between these markers, the contents of each tape are represented using the same tape symbols as in the two-tape machine, except that we denote the position of each head by placing a special ‘dot’ symbol \(\bullet\) (which we also include in the tape alphabet) to the left of the cell at which the head is currently located. Thus, if \(\Gamma\) is the tape alphabet of the two-tape machine, then the tape alphabet of the one-tape machine is
We won’t give a formal definition of the states or transition function, but we will describe how the machine works at a high level. Given an input \(w_1 w_2 \dots w_n\), the one-tape machine does the following:
Rewrite the tape to be in the correct format:
\[\# \bullet w_1 \cdots w_n \# \bullet \bot \#\]This entails shifting each input symbol by two spaces, placing a \(\#\) marker in the first cell and a dot in the second, placing a \(\#\) marker after the last input symbol (at the first blank cell), writing a dot symbol in the next cell, followed by a blank symbol, followed by one more \(\#\) marker.
Simulate each step of the two-tape machine:
Scan the tape to find the two dot symbols, representing the positions of each head, and ‘remember’ the symbols stored to the right of the dots. (Because there are only a finite number of possibilities, the machine can ‘remember’ these symbols via a finite number of states, without writing anything to the tape.)
Transition to a new state according to the transition function of the two-tape machine (depending on the current state and the symbols immediately to the right of the dots).
Replace the symbols to the right of each dot with the symbols to be written to each tape, as specified in the transition function of the two-tape machine.
Move the dots in the appropriate directions. If a dot is to move to the left but there is a \(\#\) marker there, then it stays in the same position (corresponding to what happens when a tape head tries to move left when at the leftmost tape cell). On the other hand, if the dot is to move to the right but there is a \(\#\) marker one more cell to the right, shift the marker and all the symbols following it (up to the final \(\#\) marker) one cell to the right, placing a blank symbol in the space created by the shift. This ensures that there is a valid symbol to the right of the dot (i.e., at the corresponding head) that can be read in the next simulated step.
Constructing this one-tape machine from the definition of the two-tape machine is a tedious, but completely mechanical, process. The important point is that the one-tape machine perfectly simulates the operation of the original two-tape machine, even though it typically takes many steps to carry out what the original machine does in a single step. If the two-tape machine accepts an input, then the one-tape simulation will do so as well, and similarly if the original machine rejects or loops.
We have demonstrated that a two-tape Turing-machine model is equivalent to the simpler, one-tape model. In general, a computational model or programming language that is powerful enough to simulate any one-tape Turing machine is said to be Turing-complete. In the other direction, a Turing-complete model is said to be Turing-equivalent if it can be simulated by a Turing machine; recall that the Church-Turing thesis asserts that any effective procedure can be simulated by a Turing machine.
All real-world programming languages, including C++, Python, and even LaTeX, are Turing-complete, and all of them can be simulated by a Turing machine. Thus, the results we prove about computation using the Turing-machine model also apply equivalently to real-world models. The simplicity of Turing machines makes such results easier to prove than if we were to reason about C++ or other complicated languages.
Diagonalization
Is every language decidable (by a Turing machine)? We have seen that every machine has an associated language (of strings the machine accepts), but does every language have a corresponding machine that decides it? We consider this question next.
One way of answering the question of whether there exists an undecidable language is to compare the “number” of languages with the “number” of machines. Let \(\mathcal{L}\) be the set of all languages, and let \(\mathcal{M}\) be the set of all machines. If we could demonstrate that \(\mathcal{L}\) is a “larger” set than \(\mathcal{M}\), i.e., that “\(\abs{\mathcal{M}} < \abs{\mathcal{L}}\)”, then there are “more” languages than machines, and it follows that there must be some language that does not have a corresponding machine.
However, both \(\mathcal{M}\) and \(\mathcal{L}\) are infinite sets, which is why we have put the terms “number”, “more”, etc. in “scare quotes”—these terms typically apply only to finite quantities. In order to make the above approach work, we need appropriate tools to reason about the sizes of infinite sets.
Countable Sets
We refer to the size of a set as its cardinality, which measures the number of elements in the set. For a finite set, its cardinality is an element of the natural numbers \(\N\). For an infinite set, on the other hand, the number of elements is not a natural number. [3] For our purposes, however, we need only to reason about the relative sizes of infinite sets to answer questions like whether “\(\abs{\mathcal{M}} < \abs{\mathcal{L}}\)”.
There are other kinds of numbers that can represent the sizes of infinite sets, but they are beyond the scope of this text.
To start with, we recall some terminology related to functions. A (total) function \(f \colon A \to B\) associates, or maps, each element \(a \in A\) to some element \(f(a) \in B\). (This is in contrast to a partial function, which may leave some elements of \(A\) unmapped.)
A (total) function \(f \colon A \to B\) is injective, or one-to-one, if it maps each element of \(A\) to a different element of \(B\). In other words, \(f\) is injective if
If an injective function \(f \colon A \to B\) exists, then we write \(\abs{A} \leq \abs{B}\), and say that “\(A\) is no larger than \(B\).”
Intuitively, the existence of an injective function from \(A\) to \(B\) shows that \(B\) has “at least as many elements as” \(A\), which is why we write \(\abs{A} \leq \abs{B}\). The power of comparing cardinalities via injective functions, as opposed to just by counting elements, is that it is also meaningful and works in (some) expected ways even for infinite sets. For example, it is possible to show the following.
Prove that if \(A \subseteq B\), then \(\abs{A} \leq \abs{B}\). (Hint: there is a trivial injective function from \(A\) to \(B\).)
Prove that if \(\abs{A} \leq \abs{B}\) and \(\abs{B} \leq \abs{C}\), then \(\abs{A} \leq \abs{C}\). (Hint: compose injective functions.)
However, we strongly caution that in the context of infinite sets, the notation \(\abs{A} \leq \abs{B}\) has just the specific meaning given by Definition 68; it may not satisfy other properties that you are accustomed to for comparing finite quantities. For example, we cannot necessarily add or subtract on both sides of the \(\leq\) symbol and preserve the inequality. When in doubt, refer to the definition.
We are most interested in reasoning about how the sizes of various sets compare to “standard” infinite sets like the natural numbers \(\N\). We first define the notion of a “countable” set, which is one that is “no larger than” the set \(\N\) of natural numbers.
A set \(S\) is countable if \(\abs{S} \leq \abs{\N}\), i.e., if there exists an injective function \(f \colon S \to \N\) from \(S\) to the natural numbers \(\N\).
For example, the set \(S = \set{a, b, c}\) is countable, since the following function is injective:
In fact, any finite set \(S\) is countable: we can just list all the elements of \(S\), assigning them to natural numbers in the order that we listed them. Moreover, this idea doesn’t just apply to finite sets: the same strategy can apply to an infinite set, as long as we have some way of listing its elements in some order, as the following lemma shows.
A set \(S\) is countable if and only if there is some enumeration (or list) of its elements as \(S = \set{s_0, s_1, s_2, \ldots}\), in which each \(s \in S\) appears (at least once) as \(s = s_i\) for some \(i \in \N\).
First we show that if there is some enumeration \(s_0, s_1, s_2, \ldots\) of the elements in \(S\), then there is an injective function \(f\) from \(S\) to \(\N\). For each \(s \in S\), simply define \(f(s) = i\) to be the smallest (i.e., first) index \(i\) for which \(s=s_i\). By assumption, every \(s \in S\) has such an index, so \(f\) is a (total) function. And \(f\) is injective: if \(s \neq s'\), then \(f(s) \neq f(s')\) because two different elements of \(S\) cannot occupy the same position in the list.
Now we show the opposite direction, that if there is an injective function \(f \colon S \to \N\), then there is an enumeration of \(S\) (in which each element appears exactly once, in fact). Essentially, we list the elements of \(S\) in “sorted order” by their associated natural numbers (under \(f\)). Formally, we define the sequence as: \(f^{-1}(0)\) if it exists, then \(f^{-1}(1)\) if it exists, then \(f^{-1}(2)\) if it exists, etc. Here \(f^{-1}(i) \in S\) denotes the element of \(S\) that \(f\) maps to \(i \in \N\), if it exists; note that such an element is unique because \(f\) is injective. Since \(f\) is a (total) function, every \(s \in S\) maps to exactly one \(i \in \N\), so \(s\) will appear exactly once in the enumeration, as claimed.
As a concrete example of the second direction in the above proof, consider the injective function \(f \colon \set{x,y,z} \to \N\) defined by \(f(x)=376\), \(f(y)=203\), and \(f(z)=475\). Exactly three values of \(f^{-1}(i)\) for \(i \in \N\) exist—namely, for \(i=203,376,475\)—and they yield the enumeration \(f^{-1}(203)=y, f^{-1}(376)=x, f^{-1}(475)=z\) according to the above proof.
Lemma 72 allows us to prove that a set is countable by describing an enumeration and showing that every element of the set appears somewhere in it, rather than explicitly defining an injective function from the set to the naturals. Finding an enumeration can be more convenient in many cases, as we will now see with some examples.
We show that the set of integers \(\Z\) is countable. Observe that a first attempt of listing the non-negative integers, then the negative integers, does not work:
There are infinitely many non-negative integers, so we never actually reach the negative ones. In other words, the above is not a valid enumeration of the integers, because there is no finite (natural number) position at which \(-1\) (or any other negative number) appears.
An attempt that does work is to order the integers by their absolute values, which interleaves the positive and negative elements:
An arbitrary positive integer \(i\) appears at (zero-indexed) position \(2i-1\), and an arbitrary negative integer \(-i\) appears at position \(2i\). Thus, each integer appears at some finite position in the list.
The above corresponds to the following injective function \(f \colon \Z \to \N\):
Therefore, \(\Z\) is countable, and thus countably infinite.
We now show that the set of positive rational numbers \(\Q^+\) is countable. A positive rational number \(q \in \Q^+\) can be written as the ratio \(x/y\) of a pair of positive integers \(x,y \in \N^+\). So we start by showing that the set of pairs of natural numbers \(\N \times \N\) is countable.
We use a similar idea to what we did to show that \(\Z\) is countable, i.e., demonstrating an interleaving of the elements that results in a valid enumeration. To get some inspiration, we can write down the pairs in two dimensions:
To demonstrate an injective mapping to \(\N\), however, we need to describe a one-dimensional list. Observe that we can do so by listing pairs \((x,y)\) in order by the “anti-diagonals” satisfying \(x+y=k\) for \(k=2, 3, 4, \ldots\). Each anti-diagonal has a finite number of elements (specifically, \(k-1\)), so for any particular pair, we eventually reach it after listing a finite number of diagonals.
We can proceed in the same way for \(\Q^+\). Observe that there are now duplicate elements in the enumeration; for example, \(2/2\) is equal to \(1/1\). If we wish to, we can avoid duplicates by simply skipping over them when we encounter them. However, Lemma 72 allows an enumeration to contain duplicates, so skipping them is not necessary.
Since we can construct an enumeration of the positive rationals \(\Q^+\), this set is countable.
Show that the set \(\Q\) of all rationals (both positive, negative, and zero) is countable. (Hint: use the fact that \(\Q^+\) is countable along with the idea from Example 74.)
Uncountable Sets
So far, we have demonstrated that several infinite sets are countable. But it turns out that not every set is countable; some are uncountable. How can we show this? We need to show that there does not exist any injective function from the set to the naturals. Equivalently (by Lemma 72), we need to show that there is no list that enumerates every element of the set.
In general, it is challenging to prove that there does not exist an object having certain properties. It is not enough to show that several attempts to construct such an object fail, because maybe some other clever attempt we have not thought of yet could work! Instead, we can sometimes prove the non-existence of an object via an indirect route: proof by contradiction. That is, we assume that an object having the properties does exist, and then use it to derive a logical contradiction. It follows that no such object exists.
The above strategy is often successful for showing the uncountability of certain sets. As a first example, we prove the following theorem using a technique called diagonalization. We assume, for the purpose of contradiction, that an enumeration of the set exists, and then we “use this enumeration against itself” to construct an element of the set that is not in the enumeration—a contradiction. The technique is called “diagonalization” because we construct the special element by going “down the diagonal” of the assumed enumeration and making different choices, so that the element differs from each element in the list in at least one position.
The set of real numbers in the interval \((0,1)\) is uncountable.
Suppose for the sake of establishing a contradiction that this set is countable, which means that there is an enumeration \(r_0, r_1, r_2, \ldots\) of the reals in \((0,1)\). If we imagine writing the decimal expansions of these elements as the rows of an infinite table, the result looks something like the following:
The decimal expansion in each row continues indefinitely to the right. If a number has a finite decimal representation, we pad it to the right with infinitely many zeros.
We now use this list to construct a real number \(r^* \in (0,1)\) in a certain way as follows: we choose the \(i\)th digit of \(r^*\) (zero-indexed, after the decimal point) so that it differs from the \(i\)th digit of element \(r_i\) in the list. We also choose these digits to not have an infinite sequence of 0s or 9s, so that \(r^*\) has a unique decimal expansion. (This avoids the possibility that \(r^*\) has two different decimal expansions, like \(0.1999\cdots = 0.2000\cdots\).) For example:
Here, \(r_0\) has 1 as its 0th digit, so we arbitrarily choose 2 as the 0th digit of \(r^*\). Similarly, \(r_1\) has 3 as its 1st digit, so we arbitrarily choose 4 as the 1st digit of \(r^*\); \(r_2\) has 6 as its 2nd digit, so we arbitrarily choose 7 for the 2nd digit of \(r^*\), and so on. Thus, \(r^*\) differs from \(r_i\) in the \(i\)th digit.
We now make a critical claim: \(r^*\) does not appear in the assumed enumeration of the elements of \((0,1)\). To see this, consider some arbitrary position \(i\) and the value \(r_i\) that appears there in the enumeration. By construction, the \(i\)th digits of \(r^*\) and \(r_i\) are different, and \(r^*\) has a unique decimal expansion, so \(r^* \neq r_i\). Since the position \(i\) was arbitrary, this means that \(r^*\) does not appear in the list.
We have therefore arrived at a contradiction, because we assumed at the start that every real number in \((0,1)\) appears somewhere in the enumeration, but \(r^* \in (0,1)\) is a real number that does not. So, our original assumption must be false, and therefore the set of real numbers in \((0,1)\) is uncountable, as claimed.
If we try to adapt the proof of Theorem 77 to prove that the set \(\Z\) of integers is uncountable, where does the attempted proof fail (as it must, because \(\Z\) is countable by Example 74)? Hint: what is different about the decimal expansions of real numbers versus integers?
As we will see next, diagonalization is a very powerful and general technique for proving not just the uncountability of certain sets, but also the undecidability of certain languages.
The Existence of an Undecidable Language
Returning to our original question of how the set of machines \(\mathcal{M}\) is related to the set of languages \(\mathcal{L}\), we can show using the above techniques that \(\mathcal{M}\) is countable, whereas \(\mathcal{L}\) is uncountable. We can use this to show that there are languages that are not decided by any Turing machine, i.e., they are undecidable. In fact, we can even show that there are uncountably more undecidable languages than decidable languages. Thus, in a sense that can be made formal, “almost all” languages are undecidable!
Use diagonalization to show that the set of all languages \(\mathcal{L}\) is uncountable, and formalize and rigorously prove the above claims.
We leave the formalization of these countability arguments as an exercise, and instead give a direct diagonalization-based proof that there is an undecidable language. For concreteness, we work with machines and languages having the input alphabet \(\Sigma=\set{0,1}\), but all the arguments straightforwardly generalize to any finite alphabet \(\Sigma\).
To set up the proof, we first show that the set of input strings, and the set of Turing machines, are both countable.
The (infinite) set \(\set{0,1}^*\) of binary strings is countable.
We can enumerate the binary strings by length, as follows:
This works because there are only finitely many strings of any fixed finite length (one string of length zero, two strings of length one, four strings of length two, etc.), and each string in \(\set{0,1}^*\) has some finite length, so it will eventually appear in the enumeration.
The (infinite) set \(\mathcal{M}\) of Turing machines is countable.
The key observation is that any Turing machine \(M\) has a finite description, and hence can be encoded as a (finite) binary string \(\inner{M} \in \Sigma^*\) in some unambiguous way. To see this, notice that all the components of the seven-tuple are finite: the alphabets \(\Sigma,\Gamma\), the set of states \(Q\) and the special states \(\qst,\qacc,\qrej\), and the transition function \(\delta\). In particular, \(\delta\) has a finite domain and codomain, so we can encode its list of input/output pairs as a (finite) binary string.
Since there is an injective encoding function from \(\mathcal{M}\) to \(\Sigma^*\), and \(\Sigma^*\) is countable by Lemma 81, \(\mathcal{M}\) is countable as well. (See Exercise 70 to justify this rigorously.)
We can now state and prove our main theorem.
There is an undecidable language \(A \subseteq \Sigma^* = \set{0,1}^*\).
We proceed by diagonalization (but directly, not by contradiction). As shown above, both \(\Sigma^* = \set{x_0, x_1, x_2, \ldots}\) and the set of Turing machines \(\mathcal{M} = \set{M_0, M_1, M_2, \ldots}\) are countable. So, we can imagine an infinite, two-dimensional table with machines enumerated along the rows, and input strings along the columns:
The \((i,j)\)th entry of this table indicates whether machine \(M_i\) accepts input string \(x_j\). For example, we have here that \(M_0\) accepts the string \(x_0=\varepsilon\) but does not accept the string \(x_1=0\), whereas \(M_1\) accepts both of those strings but does not accept the string \(x_2=1\).
Now consider the diagonal of this table, which indicates whether machine \(M_i\) accepts input string \(x_i\), for each \(i \in \N\). By definition, \(x_i \in L(M_i)\) if and only if the \(i\)th diagonal entry in the table is “yes”.
We now construct the language \(A \subseteq \Sigma^*\) to correspond to the “negated diagonal.” Specifically, for each string \(x_i\), we include it in \(A\) if and only if \(M_i\) does not accept \(x_i\). Formally:
By the above definition, for every \(i \in \N\), we have that \(x_i \in A\) if and only if \(x_i \notin L(M_i)\), so \(A \neq L(M_i)\). Therefore, no machine \(M_i\) in the enumeration decides \(A\). Since \(M_0, M_1, M_2, \ldots\) enumerates every Turing machine, we conclude that no machine decides \(A\)—it is undecidable.
(In fact, we have shown even more: since \(A \neq L(M_i)\) for all \(i \in \N\), there is no Turing machine that even recognizes \(A\), i.e., \(A\) is unrecognizable. See Recognizability for details.)
The above proof establishes the existence of an undecidable language, but the language is rather contrived: it is constructed based on the enumerations of machines and inputs, and the behaviors of these machines on these inputs. The associated decision problem does not seem like a “natural” problem we would care to solve in the first place, so perhaps it isn’t so important that this problem is unsolvable.
In the upcoming sections, we will prove that many quite natural and practically important languages of interest are also undecidable.
“Natural” Undecidable Problems
I don’t care to belong to any club that will have me as a member. — Groucho Marx
Suppose that while visiting a new town, you come across a barber shop with the following sign in its window:
Barber \(B\) is the best barber in town! \(B\) cuts the hair of all those people in town, and only those, who do not cut their own hair.
In other words, for any person \(X\) in the town, there are two possibilities:
If \(X\) cuts their own hair, then \(B\) does not cut \(X\)’s hair.
If \(X\) does not cut their own hair, then \(B\) cuts \(X\)’s hair.
Assuming that the sign is true, we now ask the question: does \(B\) cut their own hair? Since the barber is a person in the town, we can substitute \(X = B\) into the two cases above, and get:
If \(B\) cuts their own hair, then \(B\) does not cut \(B\)’s hair.
If \(B\) does not cut their own hair, then \(B\) cuts \(B\)’s hair.
Both cases result in a contradiction! Thus, our assumption that the sign is true must be incorrect. (Or perhaps the barber is not a person…)
This is known as the barber paradox. While its current form may seem like an idle amusement, in what follows we will see that we can devise an analogous paradox for Turing machines, which will yield a “natural” undecidable language. Then, we will use this undecidable language to show that there are many other natural and practically important undecidable languages.
Code as Input
Recall from the proof of Lemma 83 that any Turing machine itself can be unambiguously represented as a binary string, by encoding the (finite) components of its seven-tuple. We refer to this string encoding of a machine as its code, and denote it as usual with angle brackets: \(\inner{M}\) is the code of machine \(M\).
Next, because the code of a machine is just a binary string, we can contemplate the following fascinating idea: we can give the code of one machine as input to another machine! We can even give a program its own code as input.
There are many useful examples of this kind of pattern in everyday computing:
An interpreter is a program that takes the code of some arbitrary program as input, and runs it.
A compiler is a program that takes the code of some arbitrary program as input, and converts it to some other form (e.g., machine-level instructions).
A debugger is a program that takes the code of some arbitrary program as input, and runs it interactively under the control of the user (e.g., step by step, or until certain conditions on the state of the program are met, etc.).
An emulator is a program that takes the code of some arbitrary program written for a certain kind of hardware device, and runs it on some other hardware device.
Here are some examples of how in practice, a C++ program could take its own code as input:
$ g++ prog.cpp -o prog.exe # compile the program into an executable
$ ./prog.exe prog.cpp # pass the code filename as a command-line argument
$ ./prog.exe < prog.cpp # pass the code via standard input
$ ./prog.exe "`cat prog.cpp`" # pass the code as a single command-line argument
In the above example, we first passed the code prog.cpp
as
the input to the g++
compiler. The g++
compiler itself is an
example of a self-hosting program: it
is compiled by passing its own code to itself! A quine is an analogous
concept of a program that outputs its own code.
The Barber Language
With the concept of code as input in hand, we now devise a computational analog of the barber paradox. The analogy arises from the following correspondences:
instead of people, we consider Turing machines;
instead of the hair of a person, we consider the code of a TM;
instead of cutting the hair of a person, we consider
accepting
the code of a TM.
The advertisement in the barber shop then becomes:
Turing machine \(B\) is the best TM! It
accepts
the code of all Turing machines, and only those, that do notaccept
their own code.
More formally, the above says that the language \(L(B)\) of machine \(B\) is the following “barber” language:
This is because \(B\) accepts the code \(\inner{M}\) of a Turing machine \(M\) if and only if \(M\) does not accept its own code \(\inner{M}\).
Does such a Turing machine \(B\) exist? The following theorem proves that is does not, because its very existence would be a contradiction.
The language \(\Lbarber\) is undecidable (and even unrecognizable), i.e., no Turing machine decides (or even recognizes) \(\Lbarber\).
Assume for the sake of contradiction that there is a Turing machine \(B\) for which \(L(B) = \Lbarber\). Just like we asked whether the barber cuts their own hair, let us now ask the analogous question: does \(B\) accept its own code, i.e., does \(B(\inner{B})\) accept? (Since \(B\) is a Turing machine, it has some code \(\inner{B}\), and the behavior of \(B(\inner{B})\) is well defined: it either accepts or it does not.)
If \(B(\inner{B})\) accepts, then \(\inner{B} \notin \Lbarber\) by definition, so by our assumption, \(B(\inner{B})\) does not accept.
If \(B(\inner{B})\) does not accept, then \(\inner{B} \in \Lbarber\) by definition, so by our assumption, \(B(\inner{B})\) accepts.
Again, both cases result in a contradiction. Thus, our initial assumption was incorrect: no Turing machine \(B\) has the property that \(L(B) = \Lbarber\), i.e., no Turing machine decides \(\Lbarber\)—it is undecidable. In fact, we have proved even more: that no Turing machine even recognizes \(\Lbarber\).
It is interesting to observe that the definition of \(\Lbarber\) and proof of Theorem 87 can also be seen as a kind of diagonalization. Since the set of TMs is countable by Lemma 83, there is an enumeration \(M_0, M_1, M_2, \ldots\) of all Turing machines. Consider a two-dimensional infinite “table” with this enumeration along the rows, and the codes \(\inner{M_0}, \inner{M_1}, \inner{M_2}, \ldots\) along the columns. Entry \((i,j)\) of the table corresponds to whether \(M_i(\inner{M_j})\) accepts. The language corresponding to the “negated” diagonal, which is therefore not decided (or even recognized) by any TM, consists of exactly those strings \(\inner{M_i}\) for which \(M_i(\inner{M_i})\) does not accept—and this is exactly the definition of \(\Lbarber\)!
The undecidable language \(\Lbarber\) is fairly “natural”, because it corresponds to the decision problem of determining whether a given program accepts its own code. (As we have seen, running a program on its own code is a natural and potentially useful operation.) Next, we will build upon the undecidability of \(\Lbarber\) to prove the undecidability of other, arguably even more “natural” languages that are concerned with the behavior of given programs on inputs that are typically not their own code.
The Acceptance Language and Simulation
We now define the language that corresponds to the following decision problem: given a Turing machine and a string, does the machine accept the string?
The “acceptance” language for Turing machines is defined as
Unlike the undecidable languages considered above, this language is recognizable by a certain Turing machine \(U\), as we now explain. Given input \((\inner{M},x)\), machine \(U\) examines the code of the input machine \(M\) and “simulates” running whatever steps that machine would take when run on the input string \(x\), finally outputting the same decision (if it ever halts).
An interpreter is a real-world incarnation of this concept. For example, we can provide the source code of a Python program to the Python interpreter, which will read and execute the code:
$ python prog.py
Similarly, a general-purpose CPU can be seen as an interpreter: it is a fixed “program” (in this case, hardware) that, when given the source code (in this case, machine-level instructions) of some program, runs that program. An emulator can also been seen as a kind of interpreter.
A Turing machine \(U\) that is an interpreter of this kind is known as a universal Turing machine. Observe that \(U(\inner{M},x)\) has the following behavior:
If \(M(x)\) accepts, then \(U(\inner{M}, x)\) accepts.
If \(M(x)\) rejects, then \(U(\inner{M}, x)\) rejects.
If \(M(x)\) loops, then \(U(\inner{M}, x)\) loops.
Therefore, \(U\) recognizes \(\atm\) (i.e., \(L(U) = \atm\)), because it accepts every \((\inner{M},x) \in \atm\) and rejects or loops on every \((\inner{M},x) \notin \atm\). (See Definition 62.)
There are many examples of universal Turing machines in the literature, and their descriptions can be found elsewhere. For our purposes, what is important is the existence of universal machines, not the details of their implementation.
The idea of a machine \(U\) simulating the execution of a machine \(M\) whose code \(\inner{M}\) is provided as input to \(U\) is quite different from what we did in the proof of Lemma 65. In that proof, \(M_1, M_2\) are fixed machines that exist by hypothesis, and the constructed machine \(M\) has \(M_1, M_2\) “built in” to it as “subroutines”. For Turing machines, this can be done by including the states and transitions of the subroutine machine as part of the calling machine, with appropriate transitions between certain states of the two machines to represent the subroutine call and return value.
In practice, the analogous concept is using a library of
pre-existing code. For example, in C++, we can use the #include
directive, like so:
#include "M1.hpp" // include code of M1
#include "M2.hpp" // include code of M2
int M(string x) {
M1(x); // invoke M1
...
}
Now that we have shown that the language \(\atm\) is recognizable, the natural next question is whether it is decidable. For it to be decidable, there must exist some Turing machine \(C\) that has the following behavior on input \((\inner{M}, x)\):
If \(M\) accepts \(x\), then \(C(\inner{M}, x)\) accepts.
If \(M\) does not accept \(x\) (i.e., it either rejects or loops), then \(C(\inner{M}, x)\) rejects.
Observe that a universal Turing machine \(U\) does not meet these requirements, because if \(M\) loops on \(x\), then \(U(\inner{M}, x)\) loops; it does not reject. Thus, even though \(L(U)\) recognizes \(\atm\), it loops on some inputs and therefore does not decide \(\atm\) (or any other language). In fact, it turns out that no Turing machine decides \(\atm\).
The language \(\atm\) is undecidable.
Assume for the sake of contradiction that there exists a Turing machine \(C\) that decides the language \(\atm\). Below we use \(C\) to define another Turing machine \(B\) that decides the “barber” language \(\Lbarber\). Since we previously showed that \(\Lbarber\) is undecidable (see Theorem 87), this is a contradiction, and our original assumption was false. Hence no Turing machine decides \(\atm\), i.e., \(\atm\) is undecidable.
We wish to define a Turing machine \(B\) that, when given the code \(\inner{M}\) of some arbitrary machine \(M\) as input, determines whether \(M(\inner{M})\) accepts. The key idea is that \(B\) can use the machine \(C\) to determine this, because \(C\) decides \(\atm\) by assumption: given \((\inner{M'},x)\) for any Turing machine \(M'\) and any string \(x\) as input, \(C\) will correctly determine whether \(M'(x)\) accepts. So, for \(B\) to achieve its goal, it will invoke \(C\) on the machine \(M'=M\) and input string \(x=\inner{M}\). The precise definition of \(B\) is as follows:
We analyze the behavior of \(B\) on an arbitrary input string \(\inner{M}\) as follows:
If \(\inner{M} \in \Lbarber\), then \(M(\inner{M})\) does not accept by definition of \(\Lbarber\), so \((\inner{M},\inner{M}) \notin \atm\) by definition of \(\atm\), so \(C(\inner{M},\inner{M})\) rejects because \(C\) decides \(\atm\), so \(B(\inner{M})\) accepts by construction of \(B\), as needed.
Conversely, if \(\inner{M} \notin \Lbarber\), then \(M(\inner{M})\) accepts by definition of \(\Lbarber\), so \((\inner{M},\inner{M}) \in \atm\) by definition of \(\atm\), so \(C(\inner{M},\inner{M})\) accepts because \(C\) decides \(\atm\), so \(B(\inner{M})\) rejects by construction of \(B\), as needed.
Therefore, by Definition 61, \(B\) decides \(\Lbarber\), as claimed.
Alternatively, we can analyze \(B\) using the equivalent form of Definition 61, as follows. First, because \(B\) simply calls \(C\) as a subroutine and outputs the opposite answer, and \(C\) halts on any input (because \(C\) is a decider by hypothesis), \(B\) also halts on any input. Next, by definitions of the languages and assumption on \(C\),
as required.
Observe that in the proof of Theorem 90, we showed that the (previously established) undecidability of \(\Lbarber\) implies the undecidability of the new language \(\atm\). We did this by contradiction, or from another perspective, by establishing the contrapositive statement: that if \(\atm\) is decidable, then \(\Lbarber\) is decidable.
We proved this by using any hypothetical Turing machine \(C\) that decides \(\atm\) as a subroutine to construct a Turing machine \(B\) that decides \(\Lbarber\). Importantly, \(B\) did not simulate its input machine \(M\) on the code \(\inner{M}\), because doing this might cause \(B\) to loop. Instead, \(B\) “asked” \(C\) whether \(M(\inner{M})\) accepts, and negated the answer. We do not know how \(C\) determines its answer, but this does not matter: \(C\) decides \(\atm\) by hypothesis, so it halts and returns the correct answer on any input it is given.
A transformation that uses a hypothetical decider for one language as a subroutine to construct a decider for another language is called a Turing reduction. This is a very powerful and general approach for relating the (un)decidability of one language to another. More generally, reductions of various kinds are one of the most important and central ideas in theoretical computer science. See the next section and Turing Reductions for further examples and details.
The Halting Problem
Recall that the only difference between a universal Turing machine \(U\) (which we know exists) and a decider for \(\atm\) (which we proved does not exist) is how they behave on inputs \((\inner{M}, x)\) where \(M\) loops on \(x\): while \(U(\inner{M},x)\) also loops, a decider for \(\atm\) must reject such \((\inner{M},x)\). On all other inputs, \(U\) halts and returns the correct answer. So, if we could just first determine whether \(M(x)\) halts, we could decide \(\atm\). The decision problem of determining whether a given machine \(M\) halts on a given input \(x\) is called the halting problem. Since a decider for the halting problem would yield a decider for \(\atm\), this implies that the halting problem must be undecidable as well!
We now formalize this reasoning. We define \(\htm\), the language corresponding to the halting problem, as follows.
The “halting” language for Turing machines is defined as
So:
If \(M(x)\) accepts, then \((\inner{M}, x) \in \htm\).
If \(M(x)\) rejects, then \((\inner{M}, x) \in \htm\).
If \(M(x)\) loops, then \((\inner{M}, x) \notin \htm\).
The language \(\htm\) is undecidable.
Assume for the sake of contradiction that there exists a Turing machine \(H\) that decides the language \(\htm\). Below we use \(H\) to define another Turing machine \(C\) that decides \(\atm\). Since we previously showed that \(\atm\) is undecidable (see Theorem 90), this is a contradiction, and our original assumption was false. Hence no Turing machine decides \(\htm\), i.e., \(\htm\) is undecidable.
We wish to define a Turing machine \(C\) that, when given input \((\inner{M},x)\) for some arbitrary machine \(M\) and arbitrary string \(x\), determines whether \(M(x)\) accepts. As discussed above, \(C\) could simulate \(M(x)\), as in a universal Turing machine. However, this might loop, which is the only case that \(C\) must avoid. The key idea is that \(C\) can first use \(H\) to determine whether \(M(x)\) loops, and then act appropriately. The precise definition of \(C\) is as follows:
We analyze the behavior of \(C\) on an arbitrary input string \((\inner{M}, x)\) as follows:
If \((\inner{M},x) \in \atm\), then \(M(x)\) accepts and thus halts, so \((\inner{M},x) \in \htm\) and thus the call to \(H(\inner{M},x)\) accepts, so \(C\) reaches its second line, where the simulation of \(M(x)\) accepts, so \(C\) accepts, as needed.
If \((\inner{M},x) \notin \atm\), then \(M(x)\) either rejects or loops.
If it loops, then \((\inner{M},x) \notin \htm\) so the call to \(H(\inner{M},x)\) rejects, and \(C\) rejects, as needed.
If it rejects, then \((\inner{M},x) \in \htm\) so the call to \(H(\inner{M},x)\) accepts, so \(C\) reaches its second line, where the simulation of \(M(x)\) rejects, so \(C\) rejects, again as needed.
Therefore, by Definition 61, \(C\) decides \(\atm\), as claimed.
We have proved that \(\htm\) is undecidable. This is quite unfortunate, since the halting problem is a fundamental problem in software and hardware design. One of the most basic questions we can ask about a program’s correctness is whether it halts on all inputs. Since \(\htm\) is undecidable, there is no general method to answer this question even for a single input, much less all inputs.
Construct an elementary proof that \(\htm\) is undecidable by following the steps below. This was the first “existence” proof for an undecidable language, and it is due to Alan Turing.
Suppose there exists a Turing Machine \(H\) that decides \(\htm\). Using \(H\), design a Turing Machine \(M\) for which \(M(\inner{T})\) halts if and only if \(H(\inner{T},\inner{T})\) rejects, for any Turing machine \(T\).
Devise an input \(x\) for \(M\) that yields a contradiction, and conclude that \(H\) does not exist, i.e., \(\htm\) is undecidable.
Turing Reductions
Observe that the above proofs of the undecidability of the acceptance language \(\atm\) (Theorem 90) and the halting language \(\htm\) (Theorem 93) follow a common pattern for showing that some language \(B\) is undecidable:
First assume for the sake of contradiction that \(B\) is decidable, i.e., there is some Turing machine \(M_B\) that decides it.
Use \(M_B\) as a subroutine to construct a decider \(M_A\) for some known undecidable language \(A\).
Since \(M_A\) decides the undecidable language \(A\), we have reached a contradiction. So the original assumption that \(B\) is decidable must be false, hence \(B\) is undecidable.
This kind of pattern is known as a reduction from language \(A\) to language \(B\). That is, we reduce the task of solving \(A\) to that of solving \(B\), so that if \(B\) is solvable, then \(A\) is solvable as well. We next formalize and abstract out this pattern, and apply it to show the undecidability of more languages.
In practice, almost all programs make use of pre-existing components, often called libraries, to accomplish their intended tasks. As an example, consider a “hello world” program in C:
#include <stdio.h>
int main() {
printf("Hello world!");
return 0;
}
This program uses the standard stdio.h
library, invoking the
printf()
function from that library. Observe that we do not
need to know how printf()
works to write this program—we
need only know what it does (i.e., its input-output behavior). As
long as the function does what it is claimed to do, the above
program will work correctly—even if the implementation of
printf()
were to change to a completely different (but still
correct) one. In other words, the program merely treats
printf()
as a “black box” that prints its argument.
From a certain perspective, the above program reduces the task of
printing the specific string Hello world!
to that of printing
an arbitrary string. Since the latter task is accomplished by the
printf()
function, we can use it rather trivially to accomplish
the former task.
A Turing reduction from a language \(A\) to a language \(B\) is a Turing machine that decides \(A\) given access to an oracle (or “black box”) that decides \(B\). If such a reduction exists, we say that \(A\) Turing-reduces to \(B\), and write \(A \leq_T B\). [4]
As discussed more below, the “direction” of the reduction is very important. Upon first exposure (or even after many years of exposure) to this concept, the terminology “\(A\) reduces to \(B\)” may appear to be in conflict with the notation \(A \leq_T B\). The phrase “\(A\) reduces to \(B\)” refers to the problems’ “solvability”, and essentially means that solving \(A\) “comes down to” solving \(B\). For example, addition of multi-digit numbers “reduces to” addition of single digits, because we can accomplish the former via the latter. By contrast, the notation \(A \leq_T B\) refers to the problems’ intrinsic “hardness”, and essentially says that \(A\) is “no harder to solve” than \(B\) is (ignoring efficiency).
An oracle (or “black box”) that decides a language is some abstract entity that, whenever it is invoked (or queried) on any string, it correctly identifies whether that string is in the language—it accepts if so, and rejects if not. Importantly, it does not loop on any input—a useful property that reductions often exploit.
We can think of an oracle as a subroutine or “library function” (see the sidebar) that correctly performs its stated task on any input. However, it is not necessarily—and in many advanced settings, is not—a Turing machine. Oracles can potentially perform tasks that no Turing machine can do, like decide undecidable problems.
A reduction may query its oracle any (finite) number of times, on any string(s) of its choice—including none at all. However, the reduction may not “look inside” the oracle or rely in any way on how it works internally—it can use the oracle only as a “black box” subroutine.
The notation \(A \leq_T B\) reflects the intuition that, ignoring efficiency, problem \(A\) is “no harder to solve” than problem \(B\) is: if we can somehow solve \(B\), then we can also solve \(A\). However, we emphasize that the statement \(A \leq_T B\) does not require that \(B\) is actually solvable by an algorithm. It is merely a conditional statement that, if we have access to some hypothetical solver for \(B\) (as an oracle), then we can use it to solve \(A\) by an algorithm. Consistent with this intuition, one can show that \(\leq_T\) is transitive:
Formally prove that if \(A \leq_T B\) and \(B \leq_T C\), then \(A \leq_T C\).
We now prove the first formal implication of having a Turing reduction, which conforms to the above intuition.
Suppose that \(A \leq_T B\). If \(B\) is decidable, then \(A\) is also decidable.
Since \(A \leq_T B\), by Definition 96 there is a Turing machine \(M_A\) that decides \(A\) when given access to an oracle that decides \(B\). Since \(B\) is decidable, there is a Turing machine \(M_B\) that decides \(B\). Thus, in the reduction \(M_A\), we can implement its oracle using (the code of) \(M_B\). This results in an ordinary Turing machine that decides \(A\) (without relying on any oracle), so \(A\) is indeed decidable.
The following important corollary is merely the contrapositive of Lemma 98: [5]
Suppose that \(A \leq_T B\). If \(A\) is undecidable, then \(B\) is also undecidable.
We can also prove this by contradiction, since contraposition and contradiction are closely related. We are given that \(A \leq_T B\) and that \(A\) is undecidable. Suppose for the purposes of contradiction that \(B\) is decidable. Then by Lemma 98, \(A\) is also decidable, which contradicts what was given. So, our assumption about \(B\) is false, i.e., \(B\) is undecidable.
Lemma 100 is a powerful tool for proving that a language \(B\) is undecidable: it suffices to identify some other language \(A\) that is already known to be undecidable, and then prove that \(A \leq_T B\). In particular, we do not need to set up and repeat all the surrounding “boilerplate” structure of a proof by contradiction. However, it is critical that we establish the reduction in the proper “direction”! That is, we must show how to decide \(A\) using an oracle for \(B\), not the other way around.
Observe that our proofs of Theorem 90 and Theorem 93 above actually showed that \(\Lbarber \leq_T \atm\) and \(\atm \leq_T \htm\), respectively. This is because we constructed a Turing machine \(B\) that decides \(\Lbarber\) using any hypothetical decider \(C\) for \(\atm\) as an oracle (i.e., a black-box subroutine); similarly, we constructed a Turing machine \(C\) that decides \(\atm\) using any hypothetical decider \(H\) for \(\htm\) as an oracle.
Given that \(A \leq_T B\), there are four possibilities for the (un)decidability of \(A\) or \(B\), but only two of them imply anything about the (un)decidability of the other language:
Hypothesis |
Implies |
\(A\) is decidable |
nothing |
\(A\) is undecidable |
\(B\) is undecidable |
\(B\) is decidable |
\(A\) is decidable |
\(B\) is undecidable |
nothing |
In general, the other two possibilities don’t imply anything about the decidability of the other language. In fact, we can show the following:
Prove that, if \(A\) is decidable, then \(A \leq_T B\) for any language \(B\), regardless of whether \(B\) is decidable or not. (Hint: a reduction is not required to use its oracle.)
The fourth case, where \(B\) is undecidable, is more subtle. It is not the case that \(A \leq_T B\) for any language \(A\). In fact, there is a hierarchy of “degrees” of undecidable languages, and there are even pairs of “incomparable” undecidable languages \(A,B\) for which neither \(A \leq_T B\) nor \(B \leq_T A\) holds. These topics are beyond the scope of this text.
The Halts-on-Empty Problem
We saw previously that the halting-problem language \(\htm\) is undecidable. In this section we show that a more restricted form of this language is also undecidable, via a Turing reduction that employs a very useful “hard-coding” technique. The restricted language corresponds to the decision problem of determining whether a given machine halts on the empty-string input.
The “halts on the empty string” language for TMs is defined as
It is not too hard to see that \(\ehtm \leq_T \htm\), i.e., given access to an oracle \(H\) that decides \(\htm\), there is a Turing machine \(E\) that decides \(\ehtm\):
(We leave the routine analysis as an exercise.) However, as we can see from the table above, this implies nothing about the decidability of \(\ehtm\), because \(\htm\) is undecidable. Instead, to show that \(\ehtm\) is undecidable, we should Turing-reduce from some undecidable language \(A\) to \(\ehtm\), i.e., we should prove that \(A \leq_T \ehtm\).
Below we formally prove that \(\htm \leq_T \ehtm\), after discussing the key issues here first. Let \(E\) be an oracle that decides \(\ehtm\), i.e.:
\(E\) halts on any input, and
\(M'(\varepsilon)\) halts if and only if \(E(\inner{M'})\) accepts, for any Turing machine \(M'\).
(See the equivalent form of Definition 61, which is more convenient for the present treatment.)
To show that \(\htm \leq_T \ehtm\), we need to construct a reduction (a Turing machine) \(H\) that decides \(\htm\) given access to the oracle \(E\). Specifically, we need it to be the case that:
\(H\) halts on any input, and
\(M(x)\) halts if and only if \(H(\inner{M}, x)\) accepts, for any Turing machine \(M\) and string \(x\).
Hence, we need to design the reduction \(H\) to work for an input pair—a machine \(M\) and a string \(x\)—even though its oracle \(E\) takes only one input—a machine \(M'\). So, we wish to somehow transform the original \(M\) and \(x\) into just a machine \(M'\) on which to query \(E\), so that \(E\)’s (correct) answer on \(M'\) will reveal the correct answer for the original inputs \(M\) and \(x\).
A very useful strategy in this kind of circumstance is to have the reduction “hard-code” the original inputs into the code of a related new machine that the oracle can properly handle. In the present case, the reduction will construct the code for a new program \(M'\) that ignores its own input and simply runs \(M\) on \(x\). Importantly, the code for this \(M'\) can be constructed from the code for \(M\) and \(x\) simply by “syntactic” processing, not running any of this code (see the sidebar). By construction, \(M'\) halts on \(\varepsilon\) (or any other string) if and only if \(M\) halts on \(x\). Because \(E\) correctly determines whether the former is the case by hypothesis, it also implicitly reveals whether the latter is the case—which is exactly what the reduction needs to determine. We now proceed more formally.
We have that \(\htm \leq_T \ehtm\), so \(\ehtm\) is undecidable.
Let \(E\) be an oracle that decides \(\ehtm\), which has the properties stated above. We claim that the following Turing machine \(H\) decides \(\htm\) given access to \(E\):
The constructed machine \(M'\) takes an input \(w\), which it ignores, and then it just runs \(M\) on the original input \(x\). So, we have the following key property: \(M(x)\) halts if and only if \(M'(w)\) halts on all \(w\), and in particular, if and only if \(M'(\varepsilon)\) halts.
We now analyze the behavior of \(H\) on an arbitrary input \((\inner{M},x)\). First, \(H\) halts on any input, because it just constructs (but does not run!) the code for \(M'\) from \(M\) and \(x\), and then invokes \(E\), which halts by hypothesis. Second, by the definition of \(\htm\), the key property above, the hypothesis on \(E\), and the definition of \(H\),
as needed. So, by Definition 61, \(H\) decides \(\htm\), as claimed.
We stress that a very important point about the above proof of Theorem 103 is that the reduction \(H\) merely constructs \(M'\) and queries \(E\) on it; \(H\) does not run/simulate \(M'\) or \(M\). This is important because \(M'\) itself runs some arbitrary input code \(M\) that may loop on \(x\), so we cannot “risk” running it in a machine that we want to decide some language. Instead, \(H\) queries \(E\) on \(\inner{M'}\), which is “safe” because \(E\) halts on any input, by hypothesis. Indeed, \(E\) correctly determines whether \(M'(\varepsilon)\) halts, which holds if and only if \(M(x)\) halts, by construction of \(M'\).
Suppose that we have a C++ library for an interpreter \(U\). Then given the C++ code for a program \(M\) and an input string \(x\), we can easily construct C++ code for a program that ignores its input and just runs \(M\) on \(x\). The full definition of the reduction \(H\) (which decides \(\htm\)) given subroutine \(E\) (an oracle that decides \(\ehtm\)) is as follows:
int H(string M_code, string x) {
string Mp =
"#include \"U.hpp\"\n"
+ "int Mprime(string w) {\n"
+ " return U(\"" + M_code + "\", \"" + x + "\");\n"
+ "}";
return E(Mp);
}
We hardcode the inputs M_code
and x
as part of the code
Mp
for Mprime()
. [6] Since M_code
is code in string
format, we cannot run it directly, but we can pass it to the
interpreter U
to simulate its execution on x
.
Technically, Since M_code
and x
may contain newlines
and other characters that are not allowed in a C++ string
literal, we would need to either process the strings to
replace such characters with their respective escape
sequences, or use raw string literals.
More Undecidable Languages and Turing Reductions
Here we define a variety of other languages and prove them undecidable, using the reduction techniques introduced above.
Define
We show that \(\atm \leq_T \lang{FOO}\), hence that \(\lang{FOO}\) is undecidable. In fact, the proof easily generalizes to work for any fixed string in place of \(foo\), or even any fixed subset of strings.
Let \(F\) be an oracle that decides \(\lang{FOO}\), i.e.,
\(F\) halts on any input, and
\(M(foo)\) accepts if and only if \(F(\inner{M})\) accepts.
We claim that the following Turing machine \(C\) decides \(\atm\) given access to oracle \(F\). (Interestingly, observe that the code of \(C\) is identical to that of the reduction \(H\) from the proof of Theorem 103, though they are meant to decide different languages, and their oracles are different.)
The key property here is that \(M(x)\) accepts if and only if \(M'(foo)\) accepts, i.e., \(foo \in L(M')\). Indeed, this is true more generally for any fixed string or fixed subset of strings, not just \(foo\).
We analyze the behavior of \(C\) on an arbitrary input \((\inner{M},x)\). First, \(C\) halts on any input, because it just constructs (the code for) a machine \(M'\) from that of \(M\) and \(x\), and then invokes \(F\), which halts by hypothesis. Second, by the definition of \(\atm\), the key property above, the hypothesis on \(F\), and the definition of \(C\),
as needed. So, by Definition 61, \(C\) decides \(\atm\), as claimed.
Define
In other words, \(\etm\) is the set of (the codes of) machines that do not accept any input. This language corresponds to the decision problem of determining whether a given program accepts some (unspecified) string.
We stress that \(\etm\) is not the empty language \(\emptyset = \set{}\). The latter is the language that contains no strings, and it is decidable because it is decided by the trivial Turing machine \(M_\text{reject}\) that ignores its input and immediately rejects. On the other hand, \(\etm\) consists of (the codes of) infinitely many different machines. For example, (the codes of) the following machines are all elements of \(\etm\): the just-described machine \(M_\text{reject}\), a modified \(M_\text{reject}\) that does one operation before rejecting, the machine \(M_\text{loop}\) that ignores its input and loops, the machine that rejects \(\varepsilon\) but loops on every other string, etc.
We show that \(\atm \leq_T \etm\), hence \(\etm\) is undecidable. The reduction is almost identical to the ones above, with just a small tweak: the reduction negates the output of its oracle. Let \(E\) be an oracle that decides \(\etm\). We claim that the following Turing machine \(C\) decides \(\atm\) given access to oracle \(E\):
Observe that if \(M(x)\) accepts, then \(L(M') = \Sigma^* \neq \emptyset\), whereas if \(M(x)\) does not accept, then \(L(M') = \emptyset\). So, we have the key property that \((\inner{M},x) \in \atm\) if and only if \(\inner{M'} \notin \etm\).
We analyze the behavior of \(C\) on an arbitrary input \((\inner{M},x)\). First, \(C\) halts by similar reasoning as in the prior examples. Second, by the key property above, the hypothesis on \(E\), and the definition of \(C\),
So, by Definition 61, \(C\) decides \(\atm\), as claimed.
Define
This language corresponds to the decision problem of determining whether two given programs accept exactly the same strings, i.e., whether they are “functionally identical” in terms of which strings they accept.
Since this language is concerned with the languages of two given machines, it is convenient to reduce from another language that also involves the language a given machine, namely, \(\etm\). We show that \(\etm \leq_T \eqtm\), hence \(\eqtm\) is undecidable. Let \(Q\) be an oracle that decides \(\eqtm\), i.e.,
\(Q\) halts on any input, and
\((\inner{M_1}, \inner{M_2}) \in \eqtm\) if and only if \(Q(\inner{M_1}, \inner{M_2})\) accepts.
The challenge here is that, given (the code of) a machine \(M\) (an instance of \(\etm\)), the reduction should transform it into (the codes of) two machines \(M_1\) and \(M_2\), so that \(Q\)’s (correct) answer about whether \(L(M_1)=L(M_2)\) will reveal whether \(L(M) = \emptyset\). The key idea is for the reduction to let \(M_1=M\), and to construct machine \(M_2\) so that \(L(M_2)=\emptyset\); then \(L(M) = \emptyset\) if and only if \(L(M_1)=L(M_2)\). There are many such machines that would work as \(M_2\), but the one that just immediately rejects is simplest.
We claim that the following Turing machine \(E\) decides \(\etm\) given access to oracle \(Q\):
Observe that \(L(M_2) = \emptyset\), trivially. So, we have the key property that \(\inner{M} \in \etm\) if and only if \((\inner{M},\inner{M_2}) \in \eqtm\).
We analyze the behavior of \(E\) on an arbitrary input \(\inner{M}\). First, \(E\) halts because it just constructs the (fixed) machine \(M_2\) and queries \(Q\), which halts by hypothesis. Second, by the key property above, the hypothesis on \(Q\), and the definition of \(E\),
as needed. So, by Definition 61, \(E\) decides \(\etm\), as claimed.
We have seen that \(\atm \leq_T \htm\). We now show the other direction, that \(\htm \leq_T \atm\).
Let \(C\) be an oracle that decides \(\atm\), i.e.,
\(C\) halts on every input, and
\(M(x)\) accepts if and only if \(C(\inner{M},x)\) accepts.
We wish to construct a Turing machine \(H\) that decides \(\htm\), i.e.,
\(H\) halts on every input, and
\(M(x)\) halts (accepts or rejects) if and only if \(H(\inner{M},x)\) accepts.
The only case where the behavior of \(H\) needs to differ from that of \(C\) is where \(M(x)\) rejects; we need \(H\) to accept, whereas \(C\) rejects. So, to use \(C\) to get the correct answer in this case, we construct a new program \(M'\) that just negates the accept/reject decision (if any) of \(M\).
We claim that the following Turing machine \(H\) decides \(\htm\) given access to oracle \(C\):
We analyze the behavior of \(H\) on an arbitrary input \((\inner{M},x)\). First, \(H\) halts on any input, because it just constructs the code of \(M'\) from that of \(M\) and \(x\), and twice queries \(C\), which halts by hypothesis. Second, by the definition of \(\htm\), the construction of \(M'\), the hypothesis on \(C\), and the definition of \(H\),
as needed. So, by Definition 61, \(H\) decides \(\htm\), as claimed.
Observe that the reduction \(H\) invoked the oracle \(C\) twice on different inputs. In general, to prove that \(A \leq_T B\), the Turing machine that decides \(A\) may call the oracle that decides \(B\) any (finite) number of times, including more than once, or even not at all.
Prove that \(\atm \leq_T \ehtm\). (Hint: there is a fairly complex direct proof by reduction, and a very short proof using results that have already been stated.)
Prove that the language
is undecidable.
Determine, with proof, whether the following language is decidable:
Determine, with proof, whether the following language is decidable:
Wang Tiling
All the examples of undecidable languages we have seen thus far have been concerned with the behavior of Turing machines. However, there are many other languages whose definitions appear to have nothing to do with Turing machines or computation, and yet are also undecidable! It turns out that solving these problems would require solving unsolvable problems about computation, because we can “embed” the halting problem into these problems.
Here we give an overview of the undecidability of a certain geometric problem, that of tiling a plane using some set of shapes.
The computational question is: given a finite set \(S\) of shapes, called tiles, is it possible to tile the plane using only the tiles from \(S\)? We may use as many copies of each tile as we need, but they may not overlap or leave any uncovered space. For the purposes of this discussion, we also disallow rotation of tiles. (If we want to allow certain rotations, we can just include them in the tile set.)
An example of a valid tiling is the following from Study of Regular Division of the Plane with Reptiles by M. C. Escher: Here, the plane is tiled by a regular pattern using three reptile-shaped tiles.
Another example more akin to a jigsaw puzzle uses the following set of tiles:
First, we can fit the tiles together like so:
Then we can repeat this pattern to tile the whole plane:
While the two examples above admit periodic tilings, it is also possible to tile the plane in a non-periodic manner. The following is an example of a Penrose tiling that does so:
Not all tile sets admit tilings of the entire plane; for example, the set consisting of just a regular pentagon cannot do so. A natural question is whether there is an algorithm that, given an arbitrary (finite) set \(S\) of tile shapes, determines whether it is possible to tile the plane with that set. We define the corresponding language \(\lang{TILE}\) as follows:
Is \(\lang{TILE}\) decidable? The mathematician Hao Wang posed this and related questions in the early 1960s—conjecturing that the answer is “yes”—but in 1966, his student Robert Berger proved that amazingly, the answer is no! That is, there is no algorithm that, given an arbitrary (but finite) set of tile shapes, correctly determines whether those shapes can be used to tile the plane.
The key insight behind the proof is that tiling with certain shapes can be made to correspond exactly with the execution of a given Turing machine. A bit more precisely, for any Turing machine \(M\) there is a corresponding (computable) tile set \(S_M\) for which determining whether \(S_M\) tiles the plane is equivalent to determining whether \(M(\varepsilon)\) halts—i.e., whether \(\inner{M} \in \ehtm\). This insight is the heart of the proof that \(\ehtm \leq_T \lang{TILE}\), so \(\lang{TILE}\) is indeed undecidable.
For the rest of the discussion, we restrict to the slightly simpler problem of determining whether a tile set \(S\) can tile the upper-right quadrant of the plane, rather than a whole plane. That is, the language we consider is:
To simplify the picture and avoid complicated-looking shapes, we restrict our attention to tiles that are unit squares with a “color” on each edge. Two tiles may be adjacent only if their shared edges overlap completely, and the colors of these edges match. We can think of each color as representing some distinct “cut out” and “pasted on” shape, so that only tiles with matching colors “fit together” in the required way. (As mentioned before, tiles may not be rotated or flipped.) These are known as Wang tiles or Wang dominoes, after the mathematician Hao Wang, who studied the computability of domino questions like this one.
We also color the boundary of the quadrant whiteblack.
The question then is whether a given set of tiles can tile the whole quadrant under these constraints. For the tile set above, we can see that it cannot. Only tile 2 may be placed in the bottom-left corner, then only tile 3 and tile 1 may appear to its right and above it, respectively. While we can continue placing more copies of tile 1 in the upward direction, no tile has a left boundary that is colored red, so we cannot fill out the rest of the quadrant to the right.
While we were able to reason that in this specific case, the quadrant cannot be tiled with the given set, in general, no computational process can reach a correct conclusion. To demonstrate this, we will construct a tile set for which any tiling corresponds exactly to the execution of an arbitrary given Turing machine. For concreteness we will illustrate this for a single, simple machine, but the same process can be done for any given machine.
Before proceeding to our construction, we define a notation for representing the execution state, or configuration, of a Turing machine. We need to capture several details that uniquely describe the configuration of the machine, which consists of:
the machine’s active state,
the contents of the tape, and
the position of the head.
We represent these details by an infinite sequence over the (finite) alphabet \(\Gamma \cup Q\). Specifically, the sequence contains the full contents of the tape, plus the active state \(q \in Q\) immediately to the left of the symbol for the cell at which the head is located. Thus, if the input to a machine is \(s_1 s_2 \dots s_n\) and the initial state is \(q_0\), the following encodes the starting configuration of the machine:
Since the head is at the leftmost cell and the state is \(q_0\), the string has \(q_0\) to the left of the leftmost tape symbol. Then if the transition function gives \(\delta(q_0, s_1) = (q', s', R)\), the new configuration is:
The first cell has been modified to \(s'\), the machine is in state \(q'\), and the head is over the second cell, represented here by writing \(q'\) to the left of that cell’s symbol.
As a more concrete example, the configuration of the machine we saw previously that accepts strings containing only ones, when run on the input \(111010111\), is as follows at each step:
We will encode such a configuration string using an infinite sequence of tiles, where the “colors” along the top edges of the tiles indicate the configuration.
For a concrete running example, consider the simple machine that just alternately writes two different symbols to the tape. Given an initial blank tape (i.e., \(\varepsilon\) as the input), the machine will write symbols indefinitely, never terminating.
We start constructing our tile set by introducing a tile for each symbol in the tape alphabet. Each tile’s top and bottom edge is “colored” by the symbols appearing there (respectively), and the left and right edges have the “blank” color.
We next examine the transition function. For each left-moving transition
and each element of the tape alphabet \(s \in \Gamma\), we introduce the following pair of tiles:
This represents part of the configuration when applying the transition. The bottom of the right tile indicates that we are originally in state \(p\) with the head on the symbol \(i\), and the bottom of the left tile indicates there is a symbol \(s\) in the cell to the left. The tops of the tiles represent the result of the transition. The new state is \(q\) and the head moves to the left, which we indicate by the top of the left tile. The symbol in the right tile is overwritten with \(j\), which we indicate at the top of the right tile. Finally, we label the adjacent edges with \(q\) to enable the two tiles to be adjacent to each other, but not to the “symbol” tiles constructed above (which have “blank”-colored left and right edges).
Similarly, for each right-moving transition
and tape symbol \(s\), we introduce the following pair of tiles:
Here, the head moves to the right, so the original state \(p\) appears at the bottom in the left tile and the new state \(q\) appears at the top in the right tile.
For our simple Turing machine above, we have just two rightward transitions and three tape symbols, so the resulting transition tiles are those below:
(We have shown duplicate tiles here, but since we are ultimately constructing a set, we would merge any duplicates into one.)
Finally, we introduce “start” tiles to encode the initial state of the machine. Since we are interested in the empty string as input, only blank symbols appear in our start tiles. We have a corner tile that encodes the start state and the initial head position all the way to the left, and we have a bottom tile that encodes the remaining blank cells:
The full set of tiles for the simple Turing machine, wwith duplicates removed, is as follows:
Now that we have constructed our tile set, we can consider how a tiling of the quadrant would look, if one exists. First, the only tile that can go in the bottom-left corner of the quadrant is our corner start tile.
Once we place the corner tile, the only tile that can go to its right is a bottom start tile, and likewise to its right, indefinitely.
The resulting bottom row encodes the initial state of the machine: every cell contains the blank symbol (since the input is \(\varepsilon\)), the initial state is \(q_0\), and the head is at the leftmost cell. The position of the head is indicated by which tile has both a state and a symbol in its top edge.
Now that the bottom row is determined, we consider what tiles may go above that row. Only one of the tiles has \((q_0, \bot)\) at its bottom edge, and it is one of the tiles corresponding to the transition \(\delta(q_0, \bot) = (q_1, a, R)\). This forces us to use one of the other tiles for that transition to its right, and since that tile’s bottom edge must also match the top edge of the tile below it, we use the corresponding transition tile that has \(\bot\) on its bottom edge. By similar reasoning, the remaining tiles in the row must be symbol tiles for \(\bot\).
The configuration represented in the second row above corresponds to the machine having written \(a\) in the first cell, being in the state \(q_1\), with its head on the second cell. This is exactly the result of the transition.
To fill the next row, we must use a symbol tile for \(a\) at the left. Then, we must use tiles for the transition \(\delta(q_1, \bot) = (q_0, b, R)\). Finally, we must fill out the rest of the row with symbol tiles for \(\bot\).
Again, the configuration represented here corresponds exactly with the state of the machine after two steps. The next row is similarly determined.
If the machine halts after some step, then there is no way to fill in the next row of the tiling, because there are no transition tiles for the final states \(\qacc, \qrej\)—so the quadrant cannot be tiled. Conversely, if the machine runs forever, then every row can be filled with corresponding tiles, so the quadrant can be tiled. In other words, the question of whether the quadrant can be tiled is equivalent to the question of whether the machine loops on input \(\varepsilon\). Stated more precisely, the key property is as follows: the code of the machine is in \(\ehtm\) if and only if the corresponding tile set is not in \(\lang{QTILE}\).
The following formalizes our proof of undecidability of \(\lang{QTILE}\), by showing that \(\ehtm \leq_T \lang{QTILE}\) via a Turing reduction. Suppose that \(T\) is an oracle that decides \(\lang{QTILE}\). We claim that the following Turing machine \(E\) decides \(\ehtm\) given access to \(T\):
We analyze the behavior of \(E\) on an arbitrary input \(\inner{M}\). First, \(E\) halts because constructing the tile set is a finite computation based on the finite code—alphabets, states, transition function—of \(M\), and \(T\) halts by hypothesis. (We stress that he reduction does not simulate/run \(M\), or attempt to tile the quadrant; it just “syntactically” converts \(M\)’s code into a finite set of tiles.) Second, by definition of \(\ehtm\), the key property of the tile set, the hypothesis on \(T\), and the definition of \(E\),
as needed. Thus, \(E\) decides \(\ehtm\), and we have demonstrated that \(\ehtm \leq_T \lang{QTILE}\). Since \(\ehtm\) is undecidable, so is \(\lang{QTILE}\).
Modify the construction we used above to show instead that \(\htm \leq_T \lang{QTILE}\).
(Hint: only the set of start tiles needs to be changed, and the set will be different for different machine inputs.)
Recognizability
Recall that there is a relaxation of the notion of deciding, called recognizing. For convenience we repeat Definition 62 here.
A Turing machine \(M\) recognizes a language \(L \subseteq \Sigma^*\) if:
\(M\) accepts every \(x \in L\), and
\(M\) rejects or loops on every \(x \notin L\).
Equivalently: \(x \in L\) if and only if \(M\) accepts \(x\).
A language is recognizable if some Turing machine recognizes it.
Notice that, as in the definition of “decides”, the machine still must accept every string in the language. However, here there is no requirement that the machine halts on all inputs; it may loop on some (or even all) inputs that are not in the language. In other words, any output the machine produces is correct, but for inputs where the correct answer is “no” (and only for such inputs), the machine need not produce an output at all.
Tautologically, any Turing machine \(M\) recognizes exactly one language, namely, the language \(L(M) = \set{x \in \Sigma^* : M(x) \text{ accepts}}\) of those strings the machines accepts. Also trivially, a machine that decides a language also recognizes that same language, but not necessarily vice-versa.
A language may be undecidable but recognizable. We have seen several examples of undecidable languages, including \(\atm\), \(\htm\), and \(\ehtm\), and in our coverage of simulation we saw that the universal Turing machine \(U\) recognizes the language \(L(U) = \atm\), so \(\atm\) is recognizable. And we will see below, \(\htm\) and \(\ehtm\) are recognizable as well.
Is every language recognizable? We have already seen that this is not the case: the diagonalization proof of Theorem 85 constructed a (contrived) unrecognizable language, and Theorem 87 showed that the “barber” language is also unrecognizable. In this section we will see a technique for showing that other languages are unrecognizable as well.
But before dealing with unrecognizability, how can we show that a language \(L\) is recognizable? Similar to how we show that a language is decidable, we need only give a Turing machine that recognizes \(L\). For example, there is a straightforward recognizer for the halting language \(\htm\).
The language \(\htm\) is recognizable.
We claim that the following Turing machine recognizes \(\htm\).
We analyze the behavior of \(H_r\) on an arbitrary input \((\inner{M},x)\):
If \((\inner{M}, x) \in \htm\), then \(M(x)\) halts. In this case, the simulation of \(M(x)\) in \(H_r\) terminates, and \(H_r\) proceeds to accept \((\inner{M}, x)\), as needed.
If \((\inner{M}, x) \notin \htm\), then \(M(x)\) loops. In this case, the simulation of \(M(x)\) in \(H_r\) does not terminate (and the second line is never reached), so \(H_r\) loops on \((\inner{M}, x)\).
Therefore, by Definition 62, \(H_r\) recognizes \(\htm\).
Adapt the proof of Theorem 115 to prove that \(\ehtm\) is recognizable.
As another example, in Lemma 65 we proved that the class of decidable languages is closed under the union operation, i.e., if \(L_1, L_2\) are any decidable languages, then their union \(L = L_1 \cup L_2 = \set{x : x \in L_1 \ort x \in L_2}\) is also decidable. Does the same hold with “decidable” replaced by “recognizable”? It does indeed, but adapting the proof to work with recognizability is more subtle, as we now discuss.
Because \(L_1\) and \(L_2\) are recognizable, there exist Turing machines \(M_1\) and \(M_2\) that recognize \(L_1\) and \(L_2\), respectively. Let’s consider whether the machine \(M\) we constructed in the proof of Lemma 65 recognizes \(L = L_1 \cup L_2\):
Unfortunately, this machine does not necessarily recognize \(L\), because \(M_1\) may not be a decider. Specifically, suppose that \(x \notin L_1\) but \(x \in L_2\), so \(x \in L\), and therefore we need \(M(x)\) to accept in order for \(M\) to recognize \(L\). Now suppose that \(M_1(x)\) happens to loop—which it may do, because it is merely a recognizer for \(L_1\). Then \(M(x)\) loops as well—but this is incorrect behavior! Even though \(M_2(x)\) is guaranteed to accept, \(M(x)\) never reaches the point of simulating it. As a potential fix, we could try swapping the first two lines of \(M\), but that attempt would fail due to a symmetric issue.
The issue here is similar to what we encountered in the proof that the set of integers is countable (Example 74). Our attempt to enumerate the integers by first listing the nonnegative integers failed, because we would never exhaust them to reach the negative integers. The solution was to interleave the nonnegative and negative integers, so that every specific integer would eventually be reached.
Here we can follow an analogous strategy called alternation, interleaving the simulated executions of \(M_1\) and \(M_2\):
How can we alternate between the executions of \(M_1(x)\) and \(M_2(x)\)? When simulating their executions in a Turing machine, we run one step of \(M_1\) (i.e., apply its transition function once), then one step of \(M_2\), then another step of \(M_1\), then another step of \(M_2\), and so on. (If a simulated machine halts, then we do not run any further steps of it, because such steps are not defined.) Similarly, real computers simulate parallel execution of multiple programs on a single processor by rapidly switching between programs. So, we can alternate executions both in theory and in practice.
We now analyze the behavior of the above \(M\) on an arbitrary input \(x\):
If \(x \in L = L_1 \cup L_2\), then \(x \in L_1\) or \(x \in L_2\) (or both). So, at least one of \(M_1(x), M_2(x)\) accepts, because \(M_i\) is a recognizer for \(L_i\). Since \(M\) alternates between the simulated executions of \(M_1(x)\) and \(M_2(x)\), it will eventually reach a point at which one of them accepts, so \(M(x)\) accepts, as needed.
If \(x \notin L\), then \(x \notin L_1\) and \(x \notin L_2\). So, neither \(M_1(x)\) nor \(M_2(x)\) accepts; each one either rejects or loops. There are two cases: if both \(M_1(x)\) and \(M_2(x)\) reject, then \(M(x)\) will reach its final line and reject. Otherwise, at least one of \(M_1(x), M_2(x)\) loops, so the alternating simulation runs forever, i.e., \(M(x)\) loops. Either way, \(M(x)\) rejects or loops, as needed.
Alternatively and more succinctly, we can see by definition of \(L\), hypothesis on \(M_1\) and \(M_2\), and definition of \(M\),
Therefore, by Definition 62, \(M\) recognizes \(L\), hence \(L\) is recognizable. We have just proved the following theorem.
For any recognizable languages \(L_1\) and \(L_2\), their union \(L = L_1 \cup L_2 = \set{x : x \in L_1 \ort x \in L_2}\) is also recognizable. In other words, the class of recognizable languages is closed under union.
Adapt the proof of Theorem 118 to prove that the class of recognizable languages is closed under intersection.
Is the class of recognizable languages closed under complement, like the class of decidable languages is? In other words, if \(L\) is a recognizable language, is its complement \(\overline{L}\) necessarily recognizable? It turns out that the answer is no! We will prove this somewhat indirectly, by showing that if a language and its complement are both recognizable, then the language is decidable.
If a language \(L\) and its complement \(\overline{L}\) are both recognizable, then \(L\) is decidable (and by symmetry, so is \(\overline{L}\)).
By hypothesis, there exist Turing machines \(M\) and \(\overline{M}\) that recognize \(L\) and \(\overline{L}\), respectively. Since \(M\) recognizes \(L\),
And since \(\overline{M}\) recognizes \(\overline{L}\),
So, for every input \(x\), exactly one of \(M(x)\) and \(\overline{M}(x)\) accepts, and \(x \in L\) if and only if \(M(x)\) accepts and \(\overline{M}(x)\) does not accept.
We now claim that the following Turing machine \(M'\) decides \(L\). The key idea is to alternate between the executions of \(M\) and \(\overline{M}\) on the input, and see which one accepts to determine whether the input is in \(L\).
We analyze the behavior of \(M'\) on an arbitrary input \(x\). First, \(M'(x)\) halts, because as noted above, exactly one of \(M(x)\) and \(\overline{M}(x)\) accepts, and either case causes \(M'(x)\) to halt. Second, by the properties of \(M\) and \(\overline{M}\) stated above and the definition of \(M'\),
So, by Definition 61, \(M'\) decides \(L\), hence \(L\) is decidable.
Unrecognizable Languages
The contrapositive of Theorem 120 is that if a language is undecidable, then either it or its complement (or both) is unrecognizable. This gives us a tool to show that a language is unrecognizable.
If a language \(L\) is undecidable, then at least one of \(L\) and \(\overline{L}\) is unrecognizable. In particular, if \(L\) is undecidable and its complement \(\overline{L}\) is recognizable, then \(L\) is unrecognizable.
From Corollary 122 we can immediately conclude that the complement languages [7]
and
are unrecognizable.
To be precise, the complement language \(\overline{\atm}\) consists of all strings that are not of the form \((\inner{M},x)\) where \(M\) is a Turing machine that accepts \(x\) (and similarly for \(\overline{\htm}\)). Depending on the input encoding, this might include “malformed” strings that do not encode any Turing machine-string pair \((\inner{M},x)\) at all. In this case, the above description of \(\overline{\atm}\) would not be correct, because it does not include such strings. However, we can assume without loss of generality that every string encodes some input object of the proper form, simply by redefining any “malformed” strings to represent some fixed “default” object. Throughout the text we implicitly assume this, to simplify the descriptions of complement languages.
A language whose complement is recognizable (without any other restriction on the language itself) is called co-recognizable.
A language \(L\) is co-recognizable if \(\overline{L}\) is recognizable.
Since the class of decidable languages is closed under complement, and every decidable language is recognizable, any decidable language is both recognizable and co-recognizable. In fact, Claim 120 implies that the class of decidable languages is the intersection of the class of recognizable languages and the class of co-recognizable languages. [8] (Be careful about interpreting the preceding sentence: it is not saying anything about the intersection of languages themselves; it is referring to the intersection of classes of languages, i.e., the set of languages that are members of both classes.)
Since every program \(M\) recognizes exactly one language \(L(M)\), it “co-recognizes” exactly one language \(\overline{L(M)}\). Since the set of Turing machines is countable (see Lemma 83), and since each Turing machine (co-)recognizes exactly one language, the set of (co-)recognizable languages is also countable. Because the set of all languages is uncountable, “almost all” languages are neither recognizable nor co-recognizable.
The class of recognizable languages is denoted \(\RE\) (short for recursively enumerable, an equivalent definition of recognizable), and the class of co-recognizable languages is denoted \(\coRE\). The class of decidable languages is denoted \(\RD\) (short for recursive). We have \(\RE \cap \coRE = \RD\). Some languages are neither in \(\RE\) nor in \(\coRE\), but how to prove this is beyond the scope of this text.
We show that the language
is unrecognizable.
We first observe that \(\lang{NO-FOO} = \overline{\lang{FOO}}\), where \(\lang{FOO}\) is the undecidable language from Example 105. Since the complement of any undecidable language is undecidable, \(\lang{NO-FOO}\) is undecidable as well.
We now show that \(\lang{FOO}\) is recognizable. The following program recognizes it:
This is because
Since \(\lang{NO-FOO}\) is undecidable and its complement language \(\lang{FOO}\) is recognizable, by Corollary 122 we conclude that \(\lang{NO-FOO}\) is unrecognizable.
We show that the language
is unrecognizable.
Its complement language is \(\lang{FOO-OR-BAR} = \set{\inner{M} : foo \in L(M) \ort bar \in L(M)}\). As noted in Example 105, this language is undecidable, so \(\lang{NO-FOO-BAR}\) is undecidable as well.
We now show that \(\lang{FOO-OR-BAR}\) is recognizable. The following program recognizes it:
This is because
Since \(\lang{NO-FOO-BAR}\) is undecidable and its complement language \(\lang{FOO-OR-BAR}\) is recognizable, by Corollary 122 we conclude that \(\lang{NO-FOO-BAR}\) is unrecognizable.
Dovetailing
Here we introduce a powerful simulation technique called “dovetailing”, which we use to prove that the language
from Example 106 is unrecognizable. Because we already showed that it is undecidable, by Corollary 122, it suffices to show that its complement language
is recognizable. To do so, we define a recognizer \(R\) for \(\overline{\etm}\). We require the following behavior for an arbitrary input machine \(M\):
If \(M\) accepts at least one input string, then \(R(\inner{M})\) must accept;
If \(M\) does not accept any input string, then \(R(\inner{M})\) must reject or loop.
In other words, we require that \(\inner{M} \in \overline{\etm}\) if and only if \(R(\inner{M})\) accepts.
Given (the code of) an arbitrary machine \(M\), we don’t know which input(s) it accepts, if any. If we wanted to determine whether \(M\) accepts at least one input from some finite set \(S = \set{x_1, \ldots, x_n}\), then similar to how we showed that the union of two recognizable languages is recognizable, we could alternate among simulated executions of \(M\) on each element \(x_i \in S\), until one of those executions accepts (if ever). Unfortunately, in the present context, the set of possible inputs to \(M\) is the (countably) infinite set \(\Sigma^* = \set{x_1, x_2, x_3, \ldots}\), so this basic alternation technique does not work—it would run one step of \(M(x_1)\), then one step of \(M(x_2)\), then one step of \(M(x_3)\), and so on, without ever returning to \(M(x_1)\).
More explicitly, we want to simulate running \(M\) on infinitely many inputs \(x \in \Sigma^*\) for an unbounded number of steps \(s \in \N^+\) “in parallel,” so that simulation will ultimately accept if (and only if) \(M\) accepts at least one of the inputs \(x_i\) within some (finite) number of steps \(s_j\). Essentially, we have the product \(\Sigma^* \times \N^+\) of two countably infinite sets, and we need to come up with an ordering so that we will eventually reach each element \((x_i, s_j) \in \Sigma^* \times \N^+\) of this product.
Similarly to how our enumeration of the integers (Example 74) inspired the idea of alternating simulations of two executions, our enumeration of the rationals (Example 75) inspires a similar scheduling of (countably) infinitely many executions, as illustrated below. This process is called dovetailing.
Our simulation proceeds in rounds. In round \(i\), we perform a single additional step of each execution whose input index is at most \(i\). More precisely, we define \(R\) as follows, where \(\Sigma^* = \set{x_1, x_2, x_3, \ldots}\) is our standard enumeration of all input strings.
In the first round, \(R\) simulates the first step of the execution of \(M\) on the first input \(x_1\). In the second round, \(R\) simulates another step on \(x_1\), and the first one on \(x_2\). In the third round, \(R\) simulates another step on \(x_1\) and \(x_2\), and the first step on \(x_3\), and so on. Observe that any particular round \(i\) executes a finite number \(i\) of steps in total, so the next round will run. And, any particular input \(x_i\) begins its execution in round \(i\), i.e., after a finite number of steps, and its execution continues in every subsequent round until it halts (if ever).
More formally, we analyze the behavior of \(R\) as follows. By inspection of its code, \(R(\inner{M})\) accepts only if some (simulated) \(M(x_i)\) accepts, i.e., \(R(\inner{M}) \accepts \Longrightarrow \inner{M} \in \overline{\etm}\). For the other direction, if \(\inner{M} \in \overline{\etm}\), then there exists some \(x_i \in \Sigma^*\) and \(s_j \in \N^+\) for which \(M(x_i)\) accepts on step \(s_j\). Then \(R(\inner{M})\) will accept in round \(i+j-1\), because it runs the first step of the simulation on \(x_i\) in round \(i\), and it runs one more step of that simulation in each subsequent round. So, we have shown that \(\inner{M} \in \overline{\etm} \iff R(\inner{M})\) accepts, as needed.
An interesting point is that \(R\) does not reject any input, i.e., for every \(\inner{M} \in \etm\), we have that \(R(\inner{M})\) loops. This is because \(R\) keeps simulating \(M\) on an endless supply of additional inputs, searching for one that \(M\) accepts. But because there is no such input, the search never ends.
To recap, we have seen that \(\etm\) is undecidable, and that its complement \(\overline{\etm}\) is recognizable, so by Corollary 122, \(\etm\) is unrecognizable.
Determine, with proof, whether the following language is recognizable, co-recognizable, or both:
Determine, with proof, whether the following language is recognizable, co-recognizable, or both:
Rice’s Theorem
We previously proved (see Example 106) that
is undecidable.
Let’s define a similar language and determine whether it too is undecidable. Define
As we saw with \(\etm\) and \(\emptyset = \set{}\), the languages \(\epstm\) and \(\set{\varepsilon}\) are not the same. The latter is the language containing only the single element \(\varepsilon\), and it is decidable because it is decided by the following Turing machine:
On the other hand, \(\epstm\) is an infinite set consisting of (the codes of) all Turing machines that accept only the empty string \(\varepsilon\).
Here we prove that \(\atm \leq_T \epstm\), hence \(\epstm\) is undecidable. Suppose that \(E\) is an oracle that decides \(\epstm\). We claim that the following Turing machine \(C\) decides \(\atm\) given access to \(E\):
We analyze \(C\) as follows. Observe that if \(M(x)\) accepts, then \(L(M') = \set{\varepsilon}\), because \(M'\) accepts on \(w=\varepsilon\) and rejects all other \(w\). Whereas if \(M(x)\) does not accept, then \(L(M') = \emptyset \neq \set{\varepsilon}\) because \(M'\) does not accept any string. So overall, \(M(x)\) accepts if and only if \(L(M') = \set{\varepsilon}\), which implies that \((\inner{M},x) \in \atm \iff C(\inner{M},x)\) accepts. And since \(C\) halts on any input by inspection, \(C\) decides \(\atm\).
We can generalize the reasoning from the previous two proofs as follows.
Let \(A\) be a recognizable language, and define the language
consisting of (the codes of) all Turing machines \(M\) for which \(L(M) = A\). Then \(L_A\) is undecidable.
Before proving the theorem, we remark that since \(A\) is recognizable, there exists a Turing machine \(R\) for which \(L(R) = A\). In fact, there are infinitely many such machines. Given any Turing machine \(R\) that recognizes \(A\), we can construct a distinct Turing machine \(R'\) that also recognizes \(A\): letting \(Q\) be the set of states of \(R\), we construct \(R'\) by adding to \(Q\) a new state \(q'\) that has no incoming transitions; this does not affect the behavior of the machine. Since \(L_A\) has infinitely many elements, it cannot be decided by just hardcoding its elements into a Turing machine. Indeed, \(L_A\) is undecidable, as we now prove.
Assume that \(A \neq \emptyset\), since we have already shown that \(\etm\) is undecidable (see Example 106).
We show that \(\atm \leq L_A\). Letting \(D\) be an oracle that decides \(L_A\), we construct a Turing machine \(C\) that decides \(\atm\) given access to \(D\). Since \(A\) is recognizable, there exists a Turing machine \(R\) that recognizes \(A\), i.e., \(L(R)=A\), which we also use in our construction of \(C\).
(Note that it is important here that \(R\) is a Turing machine, not a “black-box” oracle, because we need to build its code into the code of the Turing machine \(M'\).)
We first analyze the behavior of \(M'\):
If \(M(x)\) accepts, then \(L(M') = A\), because \(M'\) accepts exactly those strings that \(R\) accepts, and \(L(R)=A\) by hypothesis.
Conversely, if \(M(x)\) does not accept, then \(L(M') = \emptyset \neq A\), because \(M'(w)\) either loops (if \(M(x)\) loops) or rejects (if \(M(x)\) rejects) for any input \(w\).
So altogether, we have the key property that \((\inner{M},x) \in \atm\) if and only if \(L(M')=A\).
The behavior of \(C\) on an arbitrary input \((\inner{M},x)\) is as follows. First, \(C\) halts, because it just constructs \(M'\) and queries \(D\) on it, which halts. Second, by the key property above
So by Definition 61, \(C\) decides \(\atm\), as claimed.
It is worth mentioning that Theorem 128 holds only for recognizable languages \(A\). If \(A\) is unrecognizable, then \(L_A\) is actually decidable! This is because by definition of unrecognizable, there is no Turing machine \(M\) for which \(L(M) = A\), so \(L_A = \emptyset\), which is trivially decided by the machine that rejects all inputs.
We can further generalize Theorem 128 by considering sets of languages, rather than just a single language in isolation.
A semantic property is a set of languages.
We often use the symbol \(\Prop\) to represent a semantic property. The following is an example of a semantic property:
\(\Prop_\infty\) is the set of languages that have infinitely many elements. Examples of languages in \(\Prop_\infty\) include \(\Sigma^*\), \(\set{x \in \Sigma^* : x \text{ has no $1$s}}\) (assuming \(\Sigma \neq \set{1}\)), and every undecidable language (but not every decidable language, because some of these are finite). Example of languages that are not in \(\Prop_\infty\) include \(\emptyset\) and \(\set{x \in \Sigma^* : \abs{x} < 100}\).
A semantic property \(\Prop\) is trivial if either every recognizable language is in \(\Prop\), or no recognizable language is in \(\Prop\).
Examples of trivial properties include:
\(\Prop = \emptyset\), which has no recognizable language in it;
the set of all recognizable languages \(\Prop = \RE\), which has every recognizable language in it;
the set of all unrecognizable languages \(\Prop = \overline{\RE}\), which has no recognizable language in it; and
the set of all languages \(\Prop = \mathcal{P}(\Sigma^*)\), which has every recognizable language in it.
Example of nontrivial properties include:
\(\Prop = \set{\emptyset, \Sigma^*}\), because it has the recognizable language \(\emptyset\) but not the recognizable language \(\set{\varepsilon}\);
\(\Prop = \set{\Lbarber,\atm,\htm, \text{every finite language } L}\), because it has the recognizable language \(\atm\) but not the recognizable language \(\overline{\etm}\); and
the set of decidable languages \(\Prop = \RD\), because some recognizable languages (like \(\Sigma^*\)) are decidable, and other recognizable languages (like \(\atm\) and \(\htm\)) are not decidable.
We now generalize our notion of a language of (codes of) Turing machines, from machines that recognize a specific language \(A\), to machines that recognize some language in a semantic property \(\Prop\). Define \(\Lprop\) as follows:
The next two results show that whether \(\Lprop\) is decidable is determined entirely by whether \(\Prop\) is trivial.
If \(\Prop\) is a trivial semantic property, then \(\Lprop\) is decidable.
If \(\Prop\) is trivial, then it either has all recognizable languages, or no recognizable language.
Case 1: \(\Prop\) has all recognizable languages. Then the code of every Turing machine \(M\) is in \(\Lprop\): since \(L(M)\) is recognizable by definition, \(L(M) \in \Prop\), so \(\inner{M} \in \Lprop\). So, \(\Lprop\) is decided by the machine that accepts any valid Turing-machine code.
Case 2: \(\Prop\) has no recognizable language. Then the code of no program \(M\) is in \(\Lprop\): since \(L(M)\) is recognizable by definition, \(L(M) \notin \Prop\), so \(\inner{M} \notin \Lprop\). So, \(\Lprop\) is decided by the machine that just rejects its input.
We now state and prove a very general and powerful theorem known as Rice’s theorem.
If \(\Prop\) is a nontrivial semantic property, then \(\Lprop\) is undecidable.
We show that \(\atm \leq_T \Lprop\), hence \(\Lprop\) is undecidable. Letting \(D\) be an oracle that decides \(\Lprop\), we construct a Turing machine \(C\) that decides \(\atm\) given access to \(D\).
First, we consider the case where \(\emptyset \notin \Prop\). Since \(\Prop\) is nontrivial, there is some recognizable language \(A \in \Prop\), and \(A \neq \emptyset\) because \(\emptyset \notin \Prop\). Since \(A\) is recognizable, there is a Turing machine \(R_A\) that recognizes it. We construct \(C\) exactly as in the proof of Theorem 128:
Very similar to our analysis in the proof of Theorem 128, because \(A \in \Prop\) and \(\emptyset \notin \Prop\), we have the key property that \((\inner{M},x) \in \atm\) if and only if \(L(M') \in \Prop\). The analysis of the behavior of \(C\) is also very similar: \(C\) halts on every input, and \((\inner{M},x) \in \atm \iff C(\inner{M},x)\) accepts, so \(C\) decides \(\atm\), as needed.
The case where \(\emptyset \in \Prop\) proceeds symmetrically: by nontriviality, there is some recognizable language \(B \notin \Prop\) with \(B \neq \emptyset\), and some Turing machine \(R_B\) that recognizes \(B\). Then we construct \(C\) as follows (the only difference with the \(C\) constructed above is that it uses the recognizer \(R_B\) instead of \(R_A\), and it negates the output of the oracle \(D\)):
The analysis is symmetric to the above, based on the key property that \((\inner{M},x) \in \atm\) if and only if \(L(M') \notin \Prop\). (This is why \(C\) negates the answer of \(D\).) So, \(C\) decides \(\atm\), as needed.
Rice’s Theorem and Program Analysis
Rice’s theorem has profound implications for compilers and program analysis. While the formulation given above is with respect to the language of a Turing machine, the typical expression of Rice’s theorem in the field of programming languages is more expansive:
Any “nontrivial” question about the runtime behavior of a given program is undecidable.
Here, “nontrivial” essentially means there are some programs that have the behavior in question, and others that do not. [9]
The behavior in question must be concerned with program meaning, or semantics, and it must be robust with respect to semantics-preserving transformations. For instance, a question like, “Does the program halt in less than a thousand steps?” is not about the program’s meaning, and the answer is different for different programs that have identical semantics, so the question is not robust.
As an example, let’s consider the question of whether a given Turing machine, when run on a given input \(x\), ever writes the symbol \(1\) to its tape. The language is defined as follows:
This language is not in the form required by our formulation of Rice’s theorem, both because the membership condition is not determined by \(L(M)\), and because the input has an extra argument \(x\). However, we can show that it is undecidable via reduction, by proving that \(\htm \leq_T \lang{WritesOne}\). That is, given an oracle \(W\) that decides \(\lang{WritesOne}\), we can construct a Turing machine \(H\) that decides \(\htm\).
First, given any Turing machine \(M\), we can construct a new machine \(M'\) having the property that \(M'\) writes a \(1\) to its tape if and only if \(M\) halts (when \(M\) and \(M'\) are run on the same input). Start by introducing a new symbol \(\hat{1}\) to the tape alphabet of \(M'\), and modifying the transition function so that it interprets both \(1\) and \(\hat{1}\) as \(1\) when reading from the tape, but it always writes \(\hat{1}\) instead of \(1\). (This distinction is needed because the input string may have \(1\)s in it.) Then, replace \(\qacc\) and \(\qrej\) with new states \(q'_a\) and \(q'_r\), add new accept/reject states \(\qacc'\)/\(\qrej'\), and include the transitions \(\delta(q'_a, \gamma) = (\qacc', 1, R)\) and \(\delta(q'_r, \gamma) = (\qrej', 1, R)\) for all \(\gamma\) in the modified tape alphabet. For any input \(x\), by construction, \(M'(x)\) reaches \(q'_a\) or \(q'_r\) if and only if \(M(x)\) halts, then \(M'(x)\) writes a \(1\) in the next step, and this is the only way for \(M'(x)\) to write a \(1\). Thus, \(M'(x)\) writes a \(1\) if and only if \(M(x)\) halts.
Using the above transformation, we define the reduction \(H\) as follows:
Clearly, \(H\) halts on any input. And since \(M'(x)\) writes a \(1\) if and only if \(M(x)\) halts, \(W\) accepts \((\inner{M'}, x)\) if and only if \((\inner{M},x) \in \htm\). So, \(H\) decides \(\htm\), as needed.
A similar reduction exists for any nontrivial question about program behavior, for any Turing-complete language. (We need the language to be Turing-complete because the reduction embeds an arbitrary Turing machine into a program in that language.) The implication is that there is no algorithm that does perfect program analysis; any such analysis produces the wrong result (or no result at all) for some nonempty set of inputs.
Suppose we wish to analyze a given program for some nontrivial program behavior \(B\). Define the language
We might hope for an analysis algorithm \(A\) having the following properties:
if \(\inner{M} \in L_B\), then \(A\) accepts \(\inner{M}\);
if \(\inner{M} \notin L_B\), then \(A\) rejects \(\inner{M}\).
Unfortunately, we have seen that such an algorithm does not exist, because \(L_B\) is undecidable. We must settle for an imperfect analysis. We have several choices:
A complete analysis accepts all inputs that have the behavior \(B\) (i.e., those in \(L_B\)), but also may accept or loop on some that do not have behavior \(B\). That is, a complete analysis \(A_C\) satisfies:
if \(\inner{M} \in L_B\), then \(A_C(\inner{M})\) accepts;
for \(\inner{M} \notin L_B\), there is no general guarantee about what \(A_C(\inner{M})\) does—it may accept, reject, or loop.
So, if a complete analysis rejects a program, then the program is guaranteed not to have the behavior \(B\). But if the analysis accepts the program, it may or may not have behavior \(B\). (And in general, we cannot detect if the analysis loops on a program.)
Observe that the following analysis is complete, but useless: ignore the program and just accept. In order to be useful, a complete analysis will usually have some guarantee that it rejects any program from some important subclass of those that do not have behavior \(B\) (but not all such programs).
A sound analysis rejects all inputs that do not have behavior \(B\) (i.e., those not in \(L_B\)), but also may reject or loop on some that do have behavior \(B\). That is, a sound analysis \(A_S\) satisfies:
for \(\inner{M} \in L_B\), there is no general guarantee on what \(A(\inner{M})\) does—it may accept, reject, or loop;
if \(\inner{M} \notin L_B\), then \(A_S(\inner{M})\) rejects.
So, if a sound analysis accepts a program, then the program is guaranteed to have the behavior \(B\). But if the analysis rejects the program, it may or may not have behavior \(B\).
Observe that the following analysis is sound, but useless: ignore the program and just reject. In order to be useful, a sound analysis will usually have some guarantee that it accepts any program from some important subclass of those that have behavior \(B\) (but not all such programs).
An analysis that is both incomplete and unsound might answer incorrectly on any input program, regardless of whether the program has behavior \(B\). So, the output of the analysis does not guarantee anything about the program’s behavior.
Because an incomplete and unsound analysis comes with no guarantees about its output in any case, we typically design analyses that are either complete or sound (but not both).
A type system is a concrete example of a program analysis, and the usual choice for static type systems is to use a sound analysis, rather than a complete one. This means that if the compiler accepts a program, we are guaranteed that the program will not have a type error at runtime—it is “safe”. However, the compiler does reject some programs that are free of runtime type errors. The burden is on the programmer to write their program in such a way that the compiler can guarantee it is “safe”. Often, the compiler can provide some useful guarantee that it will accept any “safe” program from some broad class.