Computability
Introduction to Computability¶
We now turn our attention to fundamental questions about computation. While we have seen several algorithmic approaches to solving problems, the first question we should answer when faced with a new problem is whether or not it is solvable at all. Otherwise, we can waste a lot of time on a fruitless effort to find a solution where none exists.
Before we can reason about what problems are solvable on a computer, we first need abstract definitions of what a problem is and what is a computer. Rather than getting bogged down in implementation details, we want to focus on abstract representations that strip away all unnecessary details. These abstractions will be easier to reason about than real computers and programs. Moreover, our abstract model will capture all possible implementations, so if we prove something holds in our model, it holds for all real computers.
Formal Languages¶
We start by defining an abstraction to represent problems. The simplest problems are those that have a yes or no answer, such as:
Is there a flight between Detroit and New York for less that $100?
Such a question is a decision problem – we want a decision on whether the answer is yes or no.
We can generalize the question above into a predicate that takes an input value:
Is there a flight between Detroit and \(x\) for less than $100?
Given any particular destination \(x\), the answer to this question is still either yes or no.
We can further generalize this predicate to work with multiple input values:
Is there a flight between \(x\) and \(y\) for less than \(z\)?
While the predicate is expressed as taking three inputs \(x\), \(y\), and \(z\), we can equivalently consider it to be a predicate that takes a single input \(w = (x, y, z)\) that is a tuple of those three components. Thus, we lose no generality in restricting ourselves to predicates with a single input.
A predicate is applicable to some universe of inputs, such as all possible airports or cities, for instance. (In programming languages, a type is used to represent a universe of values.) The predicate is true (a yes answer) for some set of inputs and is false (a no answer) for the other inputs. We call the set of yes instances a language. The following is the language defined by our second question above:
The complement language is the set of inputs for which the answer is no. We represent a complement with an overbar:
Formally, a language is defined over some alphabet that encodes the input. An alphabet is just a nonempty, finite set of symbols, and we use the Greek letter \(\Sigma\) to represent an alphabet.
\(\Sigma_{binary} = \{0, 1\}\) is an alphabet consisting of just the symbols \(0\) and \(1\).
\(\Sigma_{lowercase} = \{a, b, \dots, z\}\) is an alphabet consisting of lowercase English letters.
A string over an alphabet \(\Sigma\) is a sequence of symbols from
\(\Sigma\). For example, a binary string consists of any number of
symbols from \(\Sigma_{binary}\), while an English word is a string over
the alphabet \(\Sigma_{lowercase}\). The concept of a string appears in
most programming languages, which usually specify a string data type
that consists of a sequence of elements over a character set (e.g.
ASCII or UTF) – the character set is the string’s alphabet.
Programming languages often specify notation for string literals,
such as enclosing a sequence of characters within matching double
quotes. This notation itself is not part of the string data; for
instance, the string literal "abcd"
in C++ and Python represents
the four-character 1 sequence \(abcd\), and the quotes are not part
of this sequence.
- 1
In C++, the character-array string representation includes a null terminator as part of its implementation, but that is an implementation detail, and other implementations (e.g. the
std::string
type in C++ itself) do not use this same strategy.
An empty sequence of symbols is also a valid string – for instance,
the notation ""
represents an empty string in C++ and Python. We
use a different notation, \(\varepsilon\) (called a varepsilon in some
formatting systems such as LaTeX), which is the standard notation in
formal languages for an empty string. The empty string is a valid
string over any alphabet \(\Sigma\), as it consists of zero symbols from
that alphabet.
We typically denote a nonempty string by just writing out its symbols in order. For instance, \(1011\) is a string over the alphabet \(\Sigma_{binary}\), consisting of the symbol \(1\), followed by the symbol \(0\), followed by \(1\) and \(1\). The notation \(\abs{w}\) refers to the length (number of symbols) of the string \(w\); for instance, \(\abs{1011} = 4\).
We can use powers of a set to specify all strings of a specific length over the characters in that set. In general, the notation \(S^2\) for a set \(S\) is shorthand for \(S \times S\), the set product of \(S\) with itself. (The set product \(A \times B\) is the set of all pairs whose first component is chosen from \(A\) and second component from \(B\).) Similarly, \(S^3\) is shorthand for \(S \times S \times S\). For example:
We denote \(\Sigma^*\) as the set of all finite-length strings over an alphabet \(\Sigma\). Here, the superscript \({}^*\) is the Kleene star, and it represents taking zero or more elements from the preceding set. Thus,
In other words, \(\Sigma^*\) consists of all strings of length 0 over \(\Sigma\), plus all strings of length 1 over \(\Sigma\), all strings of length 2, and so on for all finite lengths. As an example,
is the set of all finite-length binary strings. Observe that while each element of \(\Sigma^*\) has finite length for any alphabet \(\Sigma\), the set of all finite-length strings \(\Sigma^*\) is infinite for any alphabet \(\Sigma\).
A language \(L\) is just a set of finite-length strings over some alphabet \(\Sigma\). Since \(\Sigma^*\) is the set of all finite-length strings over \(\Sigma\), we have that \(L \subseteq \Sigma^*\).
The following are examples of languages:
\(L_1 = \{11y : y \in \{0,1\}^*\}\) is a language over \(\Sigma_{binary}\), and it consists of all finite-length binary strings that begin with two ones.
\(L_2 = \{hello, world\}\) is a language over \(\Sigma_{lowercase}\), and it consists of just the two strings \(hello\) and \(world\).
The empty set \(\emptyset\) is a language over any alphabet \(\Sigma\), consisting of no strings. Note that \(\emptyset\) is distinct from \(\varepsilon\) – the former is a set of strings, while the latter is an individual string. (The difference is akin to that between the values produced by
set()
andstr()
with no arguments in Python, orstd::set<std::string>{}
andstd::string{}
in C++.)
As illustrated above, some languages are finite while others consist of infinitely many strings.
Under the formal definition of a language, we can think of the English language itself as a subset of the set of all possible finite-length strings over the English alphabet (including lowercase and uppercase letters, space and punctuation markers, and decimal digits). More specifically, it is just the subset of strings that obey the syntactic and semantic rules for English.
In computing, we often restrict ourselves to the binary alphabet \(\{0, 1\}\). Any data value can be represented as a binary string – in fact, real computers store all data in binary, from simple text files to images, audio files, and this very document itself. Thus, the languages we will work with generally consist of binary strings. When we express a language such as
what we actually mean can be more formally expressed as:
We sometimes use the notation \(\inner{x}\) to represent the binary encoding of the value \(x\). We can then express the above more concisely as:
However, we often elide the angle brackets as they are cumbersome, and the binary encoding of the inputs is implicitly assumed rather than explicitly denoted.
Now that we have a formal concept of languages (the yes instances of a problem), solving a problem entails determining whether a particular input is a member of that language (whether it is a yes instance). Thus, a program \(M\) that solves a problem \(L\) is one that:
takes an input \(x\)
performs some computations
halts and returns 1 if \(x \in L\) – the program accepts
halts and returns 0 if \(x \notin L\) – the program rejects
We say that \(M\) decides \(L\) – given an input \(x\), \(M\) performs computations to decide whether \(x \in L\) or not.
Is it possible to decide every language \(L\)? In other words, is it the case that for any language \(L\), there exists some program \(M\) that decides \(L\)? To answer this question, we need a formal definition of what a program is, or more generally, what a computer is.
Overview of Automata¶
In computability theory, an abstract computing device is known as an automaton (plural: automata). There are numerous different abstract models of computation, such as state machines, recursive functions, lambda calculus, von Neumann machines, cellular automata, and so on. We will primarily concern ourselves with the Turing-machine model, though we will first briefly examine a more restricted variant, finite automata.
Both the finite-automata and Turing-machine models are forms of state machines. A machine consists of a finite set of states, each representing a discrete point in a computation, and the machine can transition between different states. An analogous concept in a real program is a line of code (or program counter in a compiled executable) – a program has a finite number of lines, execution of a program is at a single line at any point in time, and the program can transition to a different line.
In a state machine, computation transitions between states based on what is read from the input, what is contained in memory (if the machine has memory), or both. We represent states and transitions graphically as follows:
States are drawn as vertices in the graph, labeled by the name of the state. A directed edge represents a transition, and the label denotes the conditions under which the transition is taken, as well as external side effects (e.g. writing a symbol to memory) of taking the transition. Mathematically, we encode the states as a set and transitions between states using a transition function, with states and other conditions as part of the domain, and states and side effects as part of the codomain.
The primary difference between the two models we will discuss is whether the model has access to memory cells. This has a significant effect on the computational power of a state machine. More generally, the Chomsky hierarchy is a containment hierarchy of various classes of languages, where each class corresponds to what can be computed on a particular kind of state machine:
Regular languages correspond to the computational power of finite automata (FA in the diagram below), which have no memory cells.
Context-free languages correspond to pushdown automata (PDA in the diagram below), which have a single stack as memory and can only interact with the top of the stack (pushing to and popping from it).
Context-sensitive languages are computable by linear-bounded automata (LBA in the diagram below), which have access to storage that is linear in size with respect to the input, but the storage allows access to any of the memory cells.
Recursively-enumerable languages, also called recognizable languages, correspond to the power of Turing machines (TM in the diagram below), which have access to unbounded memory that allows access to any cell.
Each of these classes of languages is strictly contained within the next, demonstrating the effect that storage has on computational power.
We proceed to examine the finite-automata model in more detail before turning our attention to Turing machines.
Finite Automata¶
The simplest state-machine model is that of finite automata, which consist of a finite number of states and no additional storage. The machine is in only one state at a time, and a computational step involves reading a symbol from the input string and transitioning to another state (or the same one) based on what symbol is read. This model and slight variations of it are known by many different names, including finite automata, deterministic finite automata (DFA), finite-state automata, finite-state machine, finite acceptor, and others. We will use the term finite automata to refer to the model.
The following is a graphical representation of a finite automaton over the input language \(\Sigma = \{a,b\}\):
This automaton has six states: \(q_0, q_a, q_{ab}, q_{abb}, q_{abba}, q_{other}\). One of the states, \(q_0\), is the special initial state, which is the state in which computation starts. Graphically, this is denoted by an incoming edge with no label and no start vertex. This particular automaton also has a single accept state, \(q_{abba}\), depicted with a double circle – if the computation terminates in this state, the machine accepts the input, otherwise the input is rejected. Finally, transitions between states are depicted as directed edges, with label corresponding to the input symbol(s) that trigger the transition.
A finite automaton works on an input string, performing a computational step for each element of that string. The machine only does a single pass over the input – computation terminates exactly when the input has been exhausted. The result is acceptance if the final state is an accept state, rejection otherwise.
As an example, we trace the execution of the finite automaton above on the input string \(abba\). We keep track of the input in the top-left of the diagrams below, and the current state is represented by a shaded vertex. Initially, the machine is in the start state \(q_0\):
The first input symbol is an \(a\), so the machine makes the transition specified for reading an \(a\) in the state \(q_0\) – move to the state \(q_a\).
The second step reads the symbol \(b\) from the input, so the machine transitions from \(q_a\) to \(q_{ab}\).
In the third step, a \(b\) is read, and the corresponding transition is from the state \(q_{ab}\) to \(q_{abb}\).
The fourth step reads an \(a\), causing a transition from \(q_{abb}\) to \(q_{abba}\).
The input has now been exhausted, so no more transitions are possible. The machine terminates in the state \(q_{abba}\); since this is an accepting state, the machine has accepted the input \(abba\).
The following is an example of running the machine on input \(aba\):
As before, the first symbol is an \(a\), so the machine transitions from the start state \(q_0\) to the state \(q_a\).
The second symbol is again a \(b\), so the machine goes from state \(q_a\) to \(q_{ab}\).
The third step reads the symbol \(a\), so the machine transitions from \(q_{ab}\) to \(q_{other}\).
The machine has reached the end of the input, so it terminates. Since the final state \(q_{other}\) is not an accepting state, the machine rejects the input \(aba\).
The only sequence of input symbols that leads to termination in an accept state is \(abba\), so this finite automaton computes the language
consisting only of the string \(abba\). Observe that an input such as \(abbab\) passes through the accept state as part of the computation, but it terminates in a non-accept state (\(q_{other}\)), so the machine rejects such an input.
Formal Definition¶
Formally, a finite automaton is defined as a five-tuple:
The components of this five-tuple are as follows:
\(Q\) is the finite set of states. In the machine above,
\[Q = \{q_0, q_a, q_{ab}, q_{abb}, q_{abba}, q_{other}\}\]\(\Sigma\) is the input alphabet; an input string is a finite sequence of symbols from this alphabet (i.e. the string is an element of \(\Sigma^*\)). In the automaton above, \(\Sigma = \{a,b\}\).
\(\delta\) is the transition function, which maps a state and input symbol to a new state:
\[\delta: Q \times \Sigma \to Q\]The transition function is depicted above as the edges between state vertices. We can also define the function in list form:
\[\begin{split}\delta(q_0, a) &= q_a\\ \delta(q_0, b) &= q_{other}\\ \delta(q_a, a) &= q_{other}\\ \delta(q_a, b) &= q_{ab}\\ \delta(q_{ab}, a) &= q_{other}\\ \delta(q_{ab}, b) &= q_{abb}\\ \delta(q_{abb}, a) &= q_{abba}\\ \delta(q_{abb}, b) &= q_{other}\\ \delta(q_{abba}, a) &= q_{other}\\ \delta(q_{abba}, b) &= q_{other}\\ \delta(q_{other}, a) &= q_{other}\\ \delta(q_{other}, b) &= q_{other}\end{split}\]Alternatively we can express the function in tabular form:
old state \(q\)
\(\delta(q,a)\)
\(\delta(q,b)\)
\(q_0\)
\(q_a\)
\(q_{other}\)
\(q_a\)
\(q_{other}\)
\(q_{ab}\)
\(q_{ab}\)
\(q_{other}\)
\(q_{abb}\)
\(q_{abb}\)
\(q_{abba}\)
\(q_{other}\)
\(q_{abba}\)
\(q_{other}\)
\(q_{other}\)
\(q_{other}\)
\(q_{other}\)
\(q_{other}\)
\(q_0 \in Q\) is the initial state.
\(F \subseteq Q\) is the set of accept states. In the machine above, \(F\) only has a single element:
\[F = \{q_{abba}\}\]
The language of a finite automaton is the set of strings that the automaton accepts. We denote this language by \(L(M)\) for a machine \(M\). For the automaton \(M\) above, we have
Consider the finite automaton \(M = (Q, \Sigma, \delta, q_0, F)\) where:
This machine has three states, of which \(q_1\) is the lone accept state. The following is a graphical representation:
The machine starts in state \(q_0\) and stays there as long as it reads just \(a\) symbols. When it sees a \(b\), the machine transitions to \(q_1\). Any subsequent symbol moves the machine to \(q_2\), where it stays as it consumes the rest of the input. Since \(q_1\) is the only accepting state, the machine only accepts strings that have a single \(b\), which must be the last element of the string. Thus, the language of the machine is
which consists of strings that start with any number of \(a\)’s followed by a single \(b\).
Finite automata have many applications, including software for designing and checking digital circuits, lexical analyzers of most compilers (usually the first phase of compilation, which chops a source file into individual words or tokens), scanning large bodies of text (pattern matching), and reasoning about many kinds of systems that have a finite number of states (e.g. financial transactions, network protocols, and so on). However, they do not model all possible computations. For instance, finite automata cannot compute the language
consisting of strings composed of an arbitrary number of \(0\)’s followed by an equal number of \(1\)’s – since a finite automaton does not have storage, it cannot maintain a counter of the number of \(0\)’s or \(1\)’s it has seen. Yet we can easily write a program in most any programming language that computes \(L\). The following is an example in C++:
#include <iostream>
int main() {
int count = 0;
char ch;
while ((std::cin >> ch) && ch == '0') {
++count;
}
if (!std::cin || ch != '1') {
return !std::cin && count == 0;
}
--count; // already read one 1
while ((std::cin >> ch) && ch == '1') {
--count;
}
return !std::cin && count == 0;
}
Thus, we need a more powerful model of computation to characterize the capabilities of real-world programs.
Turing Machines¶
A Turing machine is a powerful model for an abstract computer, invented by Alan Turing in 1936. The abstraction was designed to model pencil-and-paper computations. The core assumptions in the design of the model are as follows:
The paper is divided into discrete cells.
Each cell contains only a single symbol out of some finite set of possible symbols.
The paper is infinite – if we run out of space, we can always buy more paper.
We can only look at one cell at a time.
Though there may be an arbitrary amount of information on the paper, we can only keep a fixed amount in our brain at any time.
We model the latter using a finite set of states, with each state corresponding to a different set of information in our brain. The paper itself is represented by a tape that is bounded to the left but unbounded to the right, and the tape is divided into individual cells. Lastly, there is a read/write head that is positioned over a single cell, and it can read the contents of the cell, write a new symbol to that cell, and move one position to the left or right. The following is a pictorial representation of this model:
We formalize the definition of a Turing machine \(M\) as a seven-tuple:
The seven components of the tuple are as follows:
\(Q\) is the set of states, and it must have a finite number of elements.
\(\Sigma\) is the input alphabet. Typically, we use binary as the input alphabet, in which case \(\Sigma = \{0, 1\}\). However, we may use other input alphabets on occasion.
\(\Gamma\) is the tape alphabet. Usually, \(\Gamma = \Sigma \cup \{\bot\}\), where \(\bot\) is a special blank symbol that is distinct from any symbol in \(\Sigma\). On occasion, \(\Gamma\) will have additional elements as well, but it will always be the case that \(\Sigma \cup \{\bot\} \subseteq \Gamma\).
\(\qst\) is the initial state, and it is an element of \(Q\), i.e. \(\qst \in Q\).
\(\qacc\) and \(\qrej\) are the accept state and reject state respectively, and they are also elements of \(Q\). Together, they comprise the set of final states \(F = \{\qacc, \qrej\}\).
\(\delta\) is the transition function. It takes a non-final state and a tape symbol as input, and its output is a new state, a new tape symbol, and a direction (left or right). Formally, we have
\[\delta: (Q \setminus F) \times \Gamma \to Q \times \Gamma \times \{L, R\}\]where \(L\) and \(R\) represent “left” and “right,” respectively. The details of each component of the transition function are as follows:
\[\delta: \underbrace{(Q \setminus F)}_{\text{Non-terminal state}} \times \underbrace{\Gamma}_{\text{What is on the tape}} \to \underbrace{Q}_{\text{New state}} \times \underbrace{\Gamma}_{\text{Symbol to write}} \times \underbrace{\{L, R\}}_{\text{How to move TM head}}\]
A Turing machine is in exactly one state at any time. Initially, this is \(\qst\). The input to the machine is written on the tape, starting at the leftmost cell, with one symbol per cell. To the right of the input, every cell is initially set to the blank symbol \(\bot\). The head is positioned over the leftmost cell at the start of computation. The following illustrates the initial configuration of a machine when the input is the binary string \(111010111\):
A computational step involves an application of the transition function to the current configuration of a machine to produce a new configuration. The machine must be in a non-final state – otherwise, the machine has halted and no further computation is done. To perform a computational step, the machine:
is in some non-final state \(q \in Q\setminus F\);
reads the symbol \(s\) under the head;
transitions to some state \(q' \in Q\);
writes a symbol \(s'\) under the head;
moves the head left or right.
The transition function determines what happens in steps 3-5, based on what the situation is in steps 1-2. Given the current state \(q\) and tape symbol \(s\), \(\delta(q,s) = (q', s', d)\) gives us the new state (which may be the same as the original state \(q\)), the new symbol (which may be the same as the original symbol \(s\)), and the direction in which to move the head (\(L\) or \(R\)).
A Turing machine computes by repeatedly applying the transition function to perform computational steps, halting only when it reaches a final state \(\qacc\) or \(\qrej\). If the machine reaches the state \(\qacc\), the machine accepts the input. If the machine reaches \(\qrej\), it rejects the input. As we will see later, there is a third possibility, that the machine never reaches a final state. In this case, we say that the machine loops.
The main differences between finite automata and Turing machines are as follows:
Behavior |
Finite Automata |
Turing Machines |
---|---|---|
Tape alphabet a strict superset of input alphabet |
No |
Yes |
Can have no or more than one accept state |
Yes |
No |
Has a reject state |
No |
Yes |
Terminates when final state is reached |
No |
Yes |
Terminates when input is exhausted |
Yes |
No |
Direction head can move |
Right |
Right or Left |
Can write to the tape |
No |
Yes |
As an example of a Turing machine, we consider a simple machine that accepts any binary string composed of just ones (including the empty string), rejecting any string that contains a zero. The components of the seven-tuple are as follows:
\(Q = \{\qst, \qacc, \qrej\}\)
\(\Sigma = \{0, 1\}\)
\(\Gamma = \{0, 1, \bot\}\)
We describe \(\delta: \{\qst\} \times \{0, 1, \bot\} \to \{\qst, \qacc, \qrej\} \times \{0, 1, \bot\} \times \{L, R\}\) by listing the result of all possible inputs:
\[\begin{split}\delta(\qst, 0) &= (\qrej, 0, R)\\ \delta(\qst, 1) &= (\qst, 1, R)\\ \delta(\qst, \bot) &= (\qacc, \bot, R)\end{split}\]We can also list the results in tabular format:
old state
old symbol
new state
new symbol
direction
\(\qst\)
\(0\)
\(\qrej\)
\(0\)
\(R\)
\(\qst\)
\(1\)
\(\qst\)
\(1\)
\(R\)
\(\qst\)
\(\bot\)
\(\qacc\)
\(\bot\)
\(R\)
The machine has just three states, the initial state and the two final states. In each step, it reads the current symbol, transitioning to the reject state if it is a zero, staying in \(\qst\) if it is a one, and transitioning to the accept state if it is a \(\bot\). In all three cases, the machine leaves the contents of the cell unchanged and moves the head to the right, on to the next input symbol.
We have described the machine above by explicitly writing out its seven-tuple. More commonly, we describe a machine with a state diagram instead. Such a diagram is a labeled graph, with a vertex for each state and edges representing transitions between states. The following is an example of a transition:
The edge goes from state \(q_1\) to state \(q_2\), and these are the state components of the input and output of the transition function. The remaining components are noted on the edge label. Thus, the diagram above means that \(\delta(q_1, 1) = (q_2, 0, R)\) – when the machine is in state \(q_1\) and it sees a \(1\) under the head, it transitions to state \(q_2\), writes a \(0\) under the head, and moves the head to the right.
The state diagram for the machine we described with the seven-tuple above is as follows:
Let us proceed to run this machine on the input \(111010111\). The machine starts in the initial state, with the input written on the tape and the head in the leftmost position:
The cell underneath the head contains a one. The transition function (looking at either the seven-tuple or the state diagram) tells us that \(\delta(\qst, 1) = (\qst, 1, R)\), so the machine stays in the same state, leaves the symbol \(1\) under the head, and moves the head to the right.
Again, we have a one underneath the head. Again, the machine stays in the same state, leaves the symbol unchanged, and moves to the right.
Once again, there is a one underneath the head, resulting in the same actions as the previous two steps.
The machine now has a zero under the head. The transition function has that \(\delta(\qst, 0) = (\qrej, 0, R)\), so the machine transitions to the state \(\qrej\), leaves the \(0\) unchanged, and moves the head to the right.
We are now in a final state, so no further steps are possible. Since the machine reached the reject state, it rejects the input \(111010111\).
Had the input been composed solely of ones, the machine would have stayed in \(\qst\), moving the head one cell to the right in each step. After examining all the input symbols, it would reach a cell that contains the blank symbol \(\bot\). The transition function tells us that the machine would move to the accept state \(\qacc\), leave the blank symbol unchanged, and move the head to the right. Thus, the machine would accept any input consisting only of ones.
This machine always reaches a final state. In the worst case, it examines every input symbol, so it runs in time linear with respect to the size of the input. While this machine always halts, we will soon see that this is not the case for all machines.
As a second example, consider the following machine:
This machine has the input alphabet \(\Sigma = \{0, 1\}\), the tape alphabet \(\Gamma = \{0, 1, \bot\}\), the six states \(Q = \{\qst, q_{scan}, q_{end}, q_{back}, \qacc, \qrej\}\), and the following transition function:
old state |
old symbol |
new state |
new symbol |
direction |
\(\qst\) |
\(0\) |
\(q_{scan}\) |
\(\bot\) |
\(R\) |
\(\qst\) |
\(1\) |
\(\qrej\) |
\(1\) |
\(R\) |
\(\qst\) |
\(\bot\) |
\(\qacc\) |
\(\bot\) |
\(R\) |
\(q_{scan}\) |
\(0\) |
\(q_{scan}\) |
\(0\) |
\(R\) |
\(q_{scan}\) |
\(1\) |
\(q_{scan}\) |
\(1\) |
\(R\) |
\(q_{scan}\) |
\(\bot\) |
\(q_{end}\) |
\(\bot\) |
\(L\) |
\(q_{end}\) |
\(0\) |
\(\qrej\) |
\(0\) |
\(R\) |
\(q_{end}\) |
\(1\) |
\(q_{back}\) |
\(\bot\) |
\(L\) |
\(q_{end}\) |
\(\bot\) |
\(\qrej\) |
\(\bot\) |
\(R\) |
\(q_{back}\) |
\(0\) |
\(q_{back}\) |
\(0\) |
\(L\) |
\(q_{back}\) |
\(1\) |
\(q_{back}\) |
\(1\) |
\(L\) |
\(q_{back}\) |
\(\bot\) |
\(\qst\) |
\(\bot\) |
\(R\) |
Let us trace the execution of this machine on the input \(0011\). The machine starts in the initial state, with the input written on the tape and the head in the leftmost position:
The cell underneath the head contains a zero. The transition function tells us that \(\delta(\qst, 0) = (q_{scan}, \bot, R)\), so the machine transitions to state \(q_{scan}\), writes a blank symbol under the head, and moves the head to the right.
The cell underneath the head now contains a zero. The transition function tells us that \(\delta(q_{scan}, 0) = (q_{scan}, 0, R)\), so the machine stays in \(q_{scan}\), leaves the symbol \(0\) under the head, and moves the head to the right.
The cell underneath the head now contains a one. The transition function tells us that \(\delta(q_{scan}, 1) = (q_{scan}, 1, R)\), so the machine stays in \(q_{scan}\), leaves the symbol \(1\) under the head, and moves the head to the right.
Again, the cell underneath the head now contains a one, so the machine stays in the same state, leaves the cell unchanged, and moves the head to the right.
Now the cell underneath the head contains a blank symbol, so the transition function tells us that the machine transitions to \(q_{end}\), leaves the blank in the cell, and moves the head to the left.
The cell underneath the head contains a one, so the transition function tells us that the machine transitions to \(q_{back}\), writes a blank to the cell, and moves the head to the left.
The current cell contains a one, so the transition function tells us that the machine stays in \(q_{back}\), leaves the one in the cell, and moves the head to the left.
The current cell now contains a zero, so the transition function tells us that the machine stays in \(q_{back}\), leaves the zero in the cell, and moves the head to the left.
The current cell contains a blank, so the transition function tells us that the machine transitions to \(\qst\), leaves the blank in the cell, and moves the head to the right.
We have a zero in the current cell, so the machine transitions to \(q_{scan}\), writes a blank to the cell, and moves the head to the right.
We now have a one in the current cell, so the machine stays in \(q_{scan}\), leaves the one in the cell, and moves the head to the right.
We now have a blank, so the machine transitions to \(q_{end}\), leaves the blank in the cell, and moves the head to the left.
We have a one in the current cell, so the machine transitions to \(q_{back}\), writes a blank to the cell, and moves the head to the left.
The current cell contains a blank, so the machine transitions to \(\qst\), leaves a blank in the cell, and moves the head to the right.
The current cell contains a blank, so the machine transitions to \(\qacc\), leaves a blank in the cell, and moves the head to the right.
The machine has reached the accept state \(\qacc\), so it halts and accepts.
The language of this machine is the set of strings that contain a number of zeros followed by the same number of ones:
Thus, even though this language cannot be computed by a finite automaton, it can be solved by a Turing machine.
How powerful is the Turing-machine model? The Church-Turing thesis states the following:
If a function is computable by any mechanical device (i.e. a computer), it is computable by a Turing machine.
Thus, in spite of its simplicity, the Turing-machine model encompasses all possible computation.
Equivalent Models¶
As an illustration of the power of the simple Turing-machine model above, we will demonstrate that a modified model, that of the two-tape Turing machine, is no more powerful than the simple model. Similar to the one-tape version, the two-tape model consists of a seven-tuple:
The components \(Q\), \(\Gamma\), \(\Sigma\), \(\qst\), \(\qacc\), and \(\qrej\) are as in the one-tape model. The transition function \(\delta\), however, now takes two tape symbols as input (one under each head), writes two tape symbols as output, and moves each of the two heads independently. Formally, we have
The notation \(\Gamma^2\) is shorthand for \(\Gamma \times \Gamma\). In more detail, the components of the transition function are as follows:
The initial state of the machine has the input written on the first tape, followed by blank symbols, and only blank symbols on the second tape. Both heads begin at the leftmost cell of the respective tape.
In each step, the machine is in some state \(q\) and reads one symbol from each tape, \(\gamma_1\) and \(\gamma_2\) for example. The transition function maps \((q, \gamma_1, \gamma_2)\) to a new state, new symbols to write on each tape, and a direction to move each head:
The machine transitions to the new state \(q'\), writes \(\gamma_1'\) to the first tape and moves its head in the direction \(d_1\), and writes \(\gamma_2'\) to the second tape and moves its head in the direction \(d_2\).
We show that the one-tape and two-tape models are equivalent by proving that anything that can be computed on one model can also be computed by the other, and vice versa. In other words, for every one-tape Turing machine, there is an equivalent two-tape machine that has the same behavior on each input, and for every two-tape machine, there is a one-tape machine with the same behavior.
The first direction is trivial: given an arbitrary one-tape Turing machine, we can construct an equivalent two-tape machine that just ignores its second tape. Formally, the two machines have the same set of states \(Q\), tape and input alphabets \(\Gamma\) and \(\Sigma\), and start and final states \(\qst\), \(\qacc\), and \(\qrej\). For each mapping of the transition function of the one-tape machine
the two-tape machine has the corresponding mapping
In other words, it just moves the second head to the right in each step, while maintaining the blank symbols on the second tape. Since the behavior is solely dependent on the first tape, the two-tape machine makes the exact same transitions as the one-tape machine on a given input, and it ends up accepting, rejecting, or looping on the input exactly when the one-tape machine does so.
The other direction is a bit more complicated: given an arbitrary two-tape machine, we need to construct a one-tape machine that does the same computation on any given input. There are many ways to do this, and we describe one such construction at a high level.
First, we consider how a one-tape machine can encode the contents of two tapes, as well as two head positions. A key observation is that the amount of actual data (i.e. excluding trailing blank cells) on the two tapes is finite at any point in a computation. Thus, we can encode that data on a single tape just as well. We use the following format:
We use a special \(\#\) marker to separate the (finite) contents of each tape, as well as to indicate the ends. Between these markers, the contents of each tape are represented using the same tape symbols, except that we denote the position of each head by placing a dot symbol to the left of the cell at which the head is located. Thus, if \(\Gamma\) is the tape alphabet of the two-tape machine, then the tape alphabet of the one-tape machine is
We won’t give a formal definition of the states or transition function, but we will describe how the machine works. Given an input \(w_1 w_2 \dots w_n\), the one-tape machine does the following:
Rewrite the tape to be in the correct format:
\[\# \bullet w_1 \cdots w_n \# \bullet \bot \#\]This entails shifting each input symbol by two spaces, placing a \(\#\) marker in the first cell and a dot in the second, placing a \(\#\) marker after the last input symbol (at the first blank cell), writing a dot symbol in the next cell, followed by a blank symbol, followed by one more \(\#\) marker.
Simulate each step of the two-tape machine:
Scan the tape to find the dot symbols, representing the positions of each head.
Transition to a new state according to the transition function of the two-tape machine (depending on the current state and the symbol immediately to the right of each dot).
Replace the symbols to the right of each dot with the symbols to be written to each tape, as specified in the transition function of the two-tape machine.
Move the dots in the appropriate direction. If the dot is to move to the left but there is a \(\#\) marker there, then it stays in the same position, corresponding to what happens when a tape head tries to move left when at the leftmost cell. On the other hand, if the dot is to move to the right but there is a \(\#\) marker one more cell to the right, the machine needs to shift the marker and all following symbols (up to the last \(\#\) marker) one cell to the right, placing a blank symbol in the space created by the shift. This ensures that there is a valid symbol to the right of the dot (i.e. underneath the corresponding head) that can be read in the next simulated step.
Constructing this one-tape machine is a mechanical process from the definition of the corresponding two-tape machine, though we won’t provide all the details. The important result is that this one-tape machine does simulate the operation of the original two-tape machine, even if it takes many steps to carry out what the original machine does in a single step. If the two-tape machine accepts an input, then the one-tape simulation will also do so eventually, and similarly if the original machine rejects or loops, so will the one-tape machine.
We have demonstrated that a two-tape Turing-machine model is equivalent to the simple, one-tape model previously described. In general, a computational model or programming language that is equivalent to the model of one-tape Turing machines is said to be Turing-complete. Most real-world programming languages, such as C++, Python, and even LaTeX, are Turing-complete. Thus, the results we prove about computation using the Turing-machine model also apply to real-world models, and the simplicity of Turing machines makes those results easier to prove than if we were to reason about C++ or other complicated languages.
The Language of a Turing Machine¶
Now that we have formal definitions of both languages and Turing machines, which formalize the notions of a problem and a program, we can see how the two can be related. A Turing machine \(M\) has an input alphabet \(\Sigma\), and \(\Sigma^*\) is the set of all possible inputs to the machine. There are three possible behaviors for a machine \(M\) on input \(x\):
\(M\) accepts \(x\) – when it receives \(x\) as the input, \(M\) eventually reaches \(q_{accept}\), halting execution in the accept state
\(M\) rejects \(x\) – when it receives \(x\) as the input, \(M\) eventually reaches \(q_{reject}\), halting execution in the reject state
\(M\) loops on \(x\) – when it receives \(x\) as the input, \(M\) never reaches a final state, and execution continues forever
Thus, a machine accepts some subset of inputs, rejecting or looping on inputs outside of that set. Since a language is a subset of \(\Sigma^*\), we can define the language of a machine to be exactly the set of strings that the machine accepts:
As a result, every machine \(M\) has a corresponding language \(L(M)\). If some input \(x\) is not in the language of \(M\), i.e. \(x \notin L(M)\), then \(M\) either rejects \(x\) or loops on it. In other words, if \(A = L(M)\) is the language of \(M\), then:
If \(x \in A\), \(M\) accepts \(x\).
If \(x \notin A\), \(M\) rejects or loops on \(x\).
For the machine above that accepts strings consisting only of ones, we have \(L(M) = \{1\}^*\).
The most useful machines are those that halt on every input. Such a machine is a decider, and it must either accept or reject every input. A decider is not allowed to loop on any inputs.
If a machine \(M\) never loops, we say that \(M\) decides its language \(L(M)\). In other words, if \(M\) decides some language \(A\), then:
If \(x \in A\), \(M\) accepts \(x\).
If \(x \notin A\), \(M\) rejects \(x\).
Observe that this is more restrictive than a machine \(M\) having \(A\) as its language (\(L(M) = A\)). The latter allows \(M\) to either reject or loop on an \(x \notin A\). But if \(M\) decides \(A\), then it is only allowed to reject \(x \notin A\). In either case, \(M\) must accept \(x \in A\). Thus, we can conclude that \(M\) decides \(A\) if \(L(M) = A\) and \(M\) halts on every input.
We now have the following formalizations:
A language is a formalization of a decision problem.
A Turing machine is an abstraction of a program.
A machine deciding a language is a formalization of a program solving a decision problem.
A language \(A\) is decidable if there exists some machine \(M\) that decides it. Thus, our original question about whether a problem is solvable boils down to whether its corresponding language is decidable.
Proving a Language is Decidable¶
Must we construct a Turing machine to demonstrate that a language is decidable? No! The Church-Turing thesis allows us to write an algorithm in any language, including pseudocode, and if we can do that, we could construct a corresponding Turing machine. As an example, consider the following language:
An algorithm to decide this language is as follows:
This machine always halts, since \(Euclid(\max(a, b), \min(a, b))\) takes \(\O(\log(a+b))\) steps to compute. Furthermore, the machine has the correct behavior for inputs that are in and are not in \(\lang{GCD}\):
If \((a, b) \in \lang{GCD}\), \(\gcd(a, b) = 1\), so \(Euclid(\max(a, b), \min(a, b)) = 1\) and \(D_{\text{GCD}}\) accepts \((a, b)\).
If \((a, b) \notin \lang{GCD}\), \(\gcd(a, b) > 1\), so \(Euclid(\max(a, b), \min(a, b)) > 1\) and \(D_{\text{GCD}}\) rejects \((a, b)\).
Thus, \(D_{\text{GCD}}\) decides \(\lang{GCD}\), so \(\lang{GCD}\) is a decidable language.
In general, to demonstrate a language is decidable, we write an algorithm to decide it and then analyze the algorithm to prove that it is a proper decider for the language:
The algorithm halts on (accepts or rejects) all inputs.
The algorithm accepts all strings in the language.
The algorithm rejects all strings not in the language.
Suppose language \(L\) is decidable. We show that \(L' = L \cup \{\varepsilon\}\) is also decidable.
Since \(L\) is decidable, there exists some Turing machine \(D\) that decides \(L\). We can use it define another machine \(E\) as follows:
This machine halts on all inputs since the only nontrivial thing it does is call \(D\), and \(D\) halts since it is a decider. We analyze \(E\)’s behavior as follows:
If \(x = \varepsilon\), the first line of \(E\) accepts \(x\).
If \(x \ne \varepsilon\) and \(x \in L\), then \(D\) accepts \(x\), so \(E\) accepts \(x\).
If \(x \ne \varepsilon\) and \(x \notin L\), then \(D\) rejects \(x\), so \(E\) rejects \(x\).
Since \(E\) accepts all strings in \(L' = L \cup \{\varepsilon\}\) and rejects all strings not in \(L'\), it is a decider for \(L'\), and \(L'\) is decidable.
Are all languages decidable? We have seen that every machine is associated with a language, but does every language have a corresponding machine? We will consider this question next.
Diagonalization¶
One way of answering the question of whether or not there exists an undecidable language is to compare the number of languages with the number of machines. Let \(\mathcal{L}\) be the set of all languages, and let \(\mathcal{M}\) be the set of all machines. If we can demonstrate that \(\mathcal{L}\) is a larger set than \(\mathcal{M}\), i.e. \(\abs{\mathcal{M}} < \abs{\mathcal{L}}\), then there are more languages than machines, and it follows that there must be some language that does not have a corresponding machine. However, both \(\mathcal{M}\) and \(\mathcal{L}\) are infinite sets, so we need the appropriate tools to reason about the size of an infinite set.
Countable Sets¶
We refer to the size of a set as its cardinality, which measures the number of elements the set contains. For a finite set, its cardinality is an element of the natural numbers \(\N\). The number of elements in an infinite set, on the other hand, cannot be represented as a natural number 2. For our purposes, however, we need only to reason about the relative sizes of infinite sets to answer questions such as whether \(\abs{\mathcal{M}} < \abs{\mathcal{L}}\).
- 2
There are other kinds of numbers that can represent the sizes of infinite sets, but they are beyond the scope of this text.
To start with, we define some terminology related to functions. A total function \(f : A \to B\) maps each element \(a \in A\) to an element \(f(a) \in B\). (This is in contrast to a partial function, which may leave some elements of \(A\) unmapped.) We can characterize functions by properties of the mapping.
A total function \(f : A \to B\) is injective or one-to-one if each element \(a \in A\) is mapped to a different element \(f(a) \in B\). In other words, \(f\) is injective if
A total function \(f : A \to B\) is surjective or onto if each element \(b \in B\) is mapped to from at least one element \(a \in A\). In other words, \(f\) is surjective if
A total function \(f : A \to B\) is bijective if it is both injective (one-to-one) and surjective (onto).
We can use the existence of injective, surjective, and bijective functions to relate the sizes of two sets \(A\) and \(B\):
If an injection (one-to-one function) \(f : A \to B\) exists, then it must be the case that \(\abs{A} \le \abs{B}\). This is because each element in \(A\) is mapped to a different element of \(B\), and there must be at least as many elements in \(B\) as in \(A\) for this to be possible.
If a surjection (onto function) \(f : A \to B\) exists, then it must be the case that \(\abs{A} \ge \abs{B}\). This is because each element in \(B\) is mapped to from some element of \(A\), and since \(f\) maps each element of \(A\) to a single element of \(B\), there must be at least as many elements in \(A\) as in \(B\) for this to be possible.
If a bijection \(f : A \to B\) exists, then both \(\abs{A} \le \abs{B}\) and \(\abs{A} \ge \abs{B}\) hold, so it must be that \(\abs{A} = \abs{B}\).
We are most interested in reasoning about how the sizes of infinite sets compare to “standard” infinite sets such as the naturals \(\N\). We define the term countable, which relates the size of a set to that of \(\N\).
A set \(S\) is countable if and only if there exists an injection (one-to-one function) \(f: S \to \N\) from \(S\) to the natural numbers \(\N\).
Observe that the definition of countable above only requires that the function be injective (one-to-one), and it need not be surjective (onto). Thus, if a set \(S\) is countable, it only tells us that \(\abs{S} \le \abs{\N}\). For example, the set \(S = \{a, b, c\}\) is countable, since the following function is one-to-one:
In fact, any finite set \(S\) is countable – we can just list all the elements of \(S\), assigning them natural numbers in the order that we listed them. However, this doesn’t just apply to finite sets – the same logic applies to an infinite set if we can list its elements in some order. We can do so exactly when we can construct an injection (one-to-one function) \(f: S \to \N\). If we have a list, we can construct such a function using the same process as above, and if we have an injection \(f\), we can list the elements by the ordering determined by \(f\): start with the element \(s\) with the minimal value for \(f(s)\), then the one with the minimal value from the remaining elements, and so on. Since the natural numbers are well-ordered by the less-than operator \(<\), there is always some minimal value in any nonempty subset of \(\N\).
A set \(S\) is countable if and only if we can construct a list of all the elements of \(S\).
If \(S\) is an infinite set and is countable, we can actually construct a bijection between \(S\) and \(\N\): list the elements of \(S\) in order (e.g. according to an injective/one-to-one function \(f\)), then number the elements in the list with successive naturals starting from 0. Since the list is infinitely long, there will be some element \(s_n \in S\) associated with each element \(n \in \N\), namely the \(n\)th item from our list of the elements of \(S\). Thus, a countably infinite set \(S\) has the same cardinality as the naturals, so that \(\abs{S} = \abs{\N}\) for such a set.
In practice, we do not always define an explicit injection (one-to-one function) \(f: S \to \N\) to list the elements of \(S\). Instead, we can list a set \(S\) by describing how the list is constructed and showing that an arbitrary element \(s \in S\) appears at some finite position in that list.
We show that the set of integers \(\Z\) is countable. Observe that listing the nonnegative integers first followed by the negative integers does not work – there are infinitely many nonnegative integers, so we never get to the negative ones:
Since an element like \(-1\) does not appear at some finite position in this list, it is not a valid listing of the integers.
What does work is to order the integers by their absolute value, which interleaves the positive and negative elements:
An arbitrary positive integer \(i\) appears at (zero-indexed) position \(2i-1\), and an arbitrary negative integer \(-i\) appears at position \(2i\). Thus, any arbitrary integer appears at some finite position on the list.
Formally, we can define a one-to-one function \(f: \Z \to \N\) as follows:
Thus, \(\Z\) is countably infinite.
We show that the set of positive rational numbers \(\Q^+\) is countable. A rational number \(q \in \Q^+\) can be represented by a pair of integers \(x/y\), where \(x,y \in \N^+\). So we start by showing that the set of pairs of natural numbers \(\N \times \N\) is countable.
We use a similar idea as what we did to show that \(\Z\) is countable, i.e. demonstrating an interleaving of the elements that results in a valid list. Since we are working with pairs, we have two degrees of freedom from the elements of the pairs. Thus, we can write them out in two dimensions:
To demonstrate a one-to-one mapping to \(\N\), however, we need to describe a one-dimensional list. Observe that we can do so by taking elements in order of the diagonals – each diagonal has a finite number of elements, and we eventually reach an arbitrary element after a finite number of diagonals.
We are ordering elements \((x,y) \in \N \times \N\) according to the value of \(x + y\), and an arbitrary element \((x,y)\) appears at a position that is \(\O((x + y)^2)\) from the beginning.
We can proceed the same way with \(\Q^+\), but observe that there are duplicate elements; for example, \(2/2\) is the same element as \(1/1\). However, we can avoid duplicates by simply ignoring them when we encounter them; in other words, when we construct our list, we skip over an item if it is a duplicate of an element already in the list.
Since we can construct a list of the elements in \(\Q^+\), the set is countable.
Show that the set of all rationals \(\Q\) is countable.
Uncountable Sets¶
So far, we have demonstrated that several infinite sets are countable. How do we show that a set \(S\) is uncountable? We need to show that there is no injective (one-to-one) function \(f: S \to \N\). Equivalently, we can demonstrate that there is no list that enumerates all the elements of \(S\).
As an example, consider the set of real numbers in the interval \([0, 1)\). Assume for the purposes of contradiction that this set is countable. We can then list all the elements in order, and the result looks something like the following:
Here, we have represented each element in decimal form. Most real numbers do not have a finite representation in decimal, so the digits continue indefinitely to the right. If a number does have a finite decimal representation, we pad it to the right with infinitely many zeros.
We now use this list to construct a real number \(r^* \in [0, 1)\) as follows: we choose the \(i\)th digit (zero-indexed, after the decimal) of \(r^*\) such that it differs from the \(i\)th digit of item \(r_i\) in the list. For example:
Here, \(r_0\) has 1 as its 0th digit, so we choose 2 as the 0th digit of \(r^*\). Similarly, \(r_1\) has 3 as its 1st digit so we choose 4 as the 1st digit of \(r^*\), \(r_2\) has 6 as its 2nd digit so we choose 7 for the 2nd digit of \(r^*\), and so on 3. Thus, \(r^*\) differs from \(r_i\) in the \(i\)th digit.
\(r^*\) does not appear on the list of the elements of \([0, 1)\).
Suppose \(r^*\) appears on the list at some concrete position \(i\). However, \(r^*\) was constructed such that its \(i\)th digit differs from the \(i\)th digit of \(r_i\), the element at the \(i\)th position on the list. Since \(r^*\) and \(r_i\) differ in the \(i\)th digit, \(r^* \ne r_i\), which contradicts our assumption that \(r^*\) appears on the list at position \(i\). Since we made no assumptions about the position \(i\), this reasoning applies to all possible positions, which means that \(r^*\) does not appear at any position on the list.
Though \(r^*\) differs from any number in our list, it is a valid element of \([0, 1)\). We have arrived at a contradiction, since we previously stated that the list contains all the elements of the set. Since we have a contradiction, our original assumption that \([0, 1)\) is countable must be false.
This technique is called diagonalization, and it is a general technique for showing that a set is uncountable. Since we cannot list the reals in \([0, 1)\), we cannot list the full set \(\R\). Thus, \(\R\) is uncountable – no injection (one-to-one) function exists from \(\R\) to \(\N\), and we can conclude that \(\abs{\N} < \abs{\R}\).
- 3
Observe that \(r_3\) has 9 as its 3rd digit, but we choose 1 rather than 0 as the 3rd digit of \(r^*\). This is to avoid the technicality that when the digits after the \(k\)th digit are all 9, the number has an equivalent representation where the \(k\)th digit is one larger and the remaining digits are all 0; for example, \(0.4232\overline{9} = 0.4233\). Choosing a 1 rather than a 0 ensures that the number we construct is actually a different number than the one in the list.
The Existence of an Undecidable Language¶
Returning to our original question of how the set of machines \(\mathcal{M}\) is related to the set of languages \(\mathcal{L}\), we can show using the techniques above that \(\mathcal{M}\) is countable while \(\mathcal{L}\) is uncountable. This implies that \(\abs{\mathcal{M}} < \abs{\mathcal{L}}\), and it immediately follows that there are languages that are not associated with any machine.
We leave the countability argument as an exercise and instead demonstrate a direct diagonalization proof that there is some language \(A\) that is not associated with any machine in \(\mathcal{M}\). We first note that the set of finite binary strings \(\{0, 1\}^*\) is countable, since we can list its elements in order of increasing length:
We also use the fact that \(\mathcal{M}\) is countable and therefore its elements can be listed as well. Without loss of generality, we assume that each machine takes binary strings as input (the input values can be re-encoded in binary if this is not the case). We can then construct an infinite, two-dimensional table with machines enumerated along one dimension and inputs along the other:
The entries of this table are whether the machine corresponding to the given row accepts the input corresponding to the given column. For example, we have here that \(M_0\) accepts the empty string but rejects or loops on the string 0, while \(M_1\) accepts both of those strings but rejects or loops on the string 1.
Consider the diagonal of this table, which contains the result of machine \(M_i\) applied to the input \(x_i\) corresponding to column \(i\). By definition, we have that \(x_i \in L(M_i)\) if \(M_i\) accepts \(x_i\). In other words, \(x_i\) is in the language of \(M_i\) if the corresponding entry in the table above is “yes”.
Now construct language \(A\) as follows:
This means that \(A\) contains \(x_i\) exactly when \(L(M_i)\) does not contain \(x_i\). In other words, for each \(i\), we have that \(x_i \in A\) if \(M_i\) rejects or loops on \(x_i\). But if \(M_i\) accepts \(x_i\), then \(x_i \notin A\).
We now have a language that differs from the language of any machine in our list. In particular, \(A\) differs from \(L(M_i)\) with respect to element \(x_i\) – if \(x_i \in L(M_i)\), then \(x_i \notin A\), but if \(x_i \notin L(M_i)\), then \(x_i \in A\). Since our list exhaustively enumerates all machines, there is no machine whose language is \(A\). If there is no machine whose language is \(A\), there is no machine that decides \(A\), so \(A\) is undecidable 4. This means that no computer in the universe can solve it, no matter how many resources we throw at it.
We can go further – we know that that \(\mathcal{M}\) is countable but \(\mathcal{L}\) is uncountable, so there are far more (in fact, uncountably infinitely more) undecidable languages than decidable languages. Thus, most problems are actually unsolvable!
Observe that our proof above was nonconstructive – we showed that there exists some undecidable language, but we did not show that a specific language is undecidable. We will do so next, demonstrating the undecidability of a specific language of interest.
- 4
Not all machines are deciders, and we have shown that \(A\) is not the language of any machine, not just deciders. Thus, \(A\) is not just undecidable but unrecognizable, a concept we will see later.
Prove that the set of all machines \(\mathcal{M}\) is countable.
Hint: Use the fact that \(\{0, 1\}^*\) is countable and what we know about binary strings.
Use diagonalization to show that the set of all languages \(\mathcal{L}\) is uncountable.
The Halting Problem¶
I don’t care to belong to any club that will have me as a member. — Groucho Marx
Suppose upon visiting a new town, you come across a barbershop with the following sign in its window:
Barber \(B\) is the best barber in town! \(B\) cuts the hair of those, and only those, who do not cut their own hair.
In other words, if person \(X\) lives in town, there are two possibilities:
\(X\) cuts their own hair. Then \(B\) does not cut \(X\)’s hair.
\(X\) does not cut their own hair. Then \(B\) cuts \(X\)’s hair.
Assuming the sign is true and that \(B\) is also a resident of the town, does \(B\) cut their own hair? Substituting \(X = B\) into the two cases above, we have:
\(B\) cuts their own hair. Then \(B\) does not cut \(B\)’s hair.
\(B\) does not cut their own hair. Then \(B\) cuts \(B\)’s hair.
Both cases result in a contradiction! Thus, our assumption that the sign is true and that \(B\) lives in town must be incorrect.
This is known as the barber paradox, and we will see that we can construct a similar situation out of Turing machines. In particular, we will define a specific language \(\atm\), and we will assume that there exists a decider \(C\) for \(\atm\). We will then be able to construct another decider \(D\) such that:
\(D\) accepts the source code of those, and only those, programs that do not accept their own source code as input.
If we consider whether \(D\) accepts its own source code, we end up in a similar contradiction as that of the barber paradox. Thus, our assumption that there exists a decider for \(\atm\) is incorrect.
Before we proceed with this construction, we first review some relevant details about data representations and Turing machines. Every data value can be represented as a binary string, and in fact, real computers store all data in binary format. A Turing machine itself can be represented as a binary string, e.g. by taking its seven-tuple description and encoding it into binary. We refer to this string representation of a machine as its source code, and we typically denote it with angle brackets (e.g. \(\inner{M}\) is the source code of machine \(M\)). Since the individual components of a seven-tuple (the set of states, input and tape alphabets, transition function, etc.) are all finite, the source code of a Turing machine is a binary string of finite length.
We have also seen the Church-Turing thesis, which states that any computable function can be computed by a Turing machine. A direct implication of the thesis is that if we can write a program in some other model, such as C++ or even pseudocode, we can write a Turing machine that is equivalent to that program. Thus, when reasoning about Turing machines, we often give pseudocode descriptions rather than doing all the work of defining an equivalent seven-tuple Turing machine.
Similarly, as discussed previously, if all algorithms that can be implemented on a Turing machine can also be implemented in some other programming model \(PL\), \(PL\) is said to be Turing-complete. Real-world programming languages, such as C++, Python, and even LaTeX, are generally Turing-complete.
Since the source code of a program is just a binary string, we can pass it as the input to another program. We can even pass a program its own source code as input! Examples of doing so in C++ are as follows:
$ g++ prog.cpp -o prog.exe
$ ./prog.exe prog.cpp # pass the source filename as a command-line argument
$ ./prog.exe < prog.cpp # pass the source code to standard input
$ ./prog.exe "`cat prog.cpp`" # pass the source code as a single command-line argument
A compiler or interpreter is itself a program that operates on the
source code of other programs. In the example above, we passed the
source code prog.cpp
as a command-line argument to the g++
compiler. The g++
compiler itself is self-hosting: it is
compiled by passing its own source code to itself 5.
Recall that a language \(A\) is decidable if there exists a program \(M\) that decides it:
If \(x \in A\), then \(M\) accepts \(x\).
If \(x \notin A\), then \(M\) rejects \(x\).
We now consider whether the union of two decidable languages is decidable.
Let \(A_1\) and \(A_2\) be decidable languages, and let \(B = A_1 \cup A_2\). In other words, \(B\) is the set of strings that are in either \(A_1\) or \(A_2\) or both: \(B = \{x : x \in A_1 \ort x \in A_2\}\). Is \(B\) a decidable language?
Since \(A_1\) and \(A_2\) are decidable, there exist machines \(M_1\) and \(M_2\), respectively, that decide them. Then \(M_1\) accepts exactly those strings that are in \(A_1\), rejecting every other string, and similarly for \(M_2\) and \(A_2\). We can then construct a new machine \(M\) as follows:
We analyze this machine as follows:
If \(x \in A_1\), then \(M_1\) accepts \(x\) since \(M_1\) is a decider for \(A_1\). Since \(M_1\) accepts, so does \(M\).
If \(x \notin A_1\), then \(M_1\) rejects \(x\) since \(M_1\) is a decider (a decider cannot loop). So running \(M_1\) on \(x\) eventually terminates, and \(M\) proceeds to invoke \(M_2\) on \(x\). Then:
If \(x \in A_2\), \(M_2\) accepts \(x\), and so does \(M\).
If \(x \notin A_2\), \(M_2\) rejects \(x\) since \(M_2\) is a decider. Thus, the invocation of \(M_2\) on \(x\) terminates, and \(M\) proceeds to reject \(x\).
Thus, \(M\) accepts all inputs that are in \(A_1\) or \(A_2\), and it rejects an input \(x\) if \(x \notin A_1 \wedge x \notin A_2\). Since \(B = A_1 \cup A_2\), this means that \(M\) accepts all inputs in \(B\) and rejects all inputs not in \(B\), so \(M\) is a decider for \(B\). Since there exists a machine that decides \(B\), \(B\) is a decidable language.
Example 54 demonstrates that the class of decidable languages is closed under union, meaning that if we apply the union operation to two members of the class, the result is also a member of the class, i.e. a decidable language.
Show that the class of decidable languages is closed under:
intersection (i.e if \(A\) and \(B\) are decidable, so is \(A \cap B\))
complement (i.e. if \(A\) is decidable, so is \(\overline{A}\))
The reasoning in Example 54 assumed that a program \(M\) can invoke a separate program \(M'\) on an input. There are two ways we accomplish this:
Incorporate \(M'\) as a subroutine directly in \(M\) itself. For a Turing machine, this can be done by inserting the state diagram for \(M'\) into \(M\), with the addition of a transition between some state of \(M\) to the start state of \(M'\), and likewise from each of the final states of \(M'\) to states in \(M\). In programming languages such as C++, we can do similarly with a
#include
directive 6:#include "Mprime.hpp" // include declaration of Mprime int M(string x) { Mprime(x); // invoke Mprime ... }
Other languages include analogous
import
directives.- 6
Typically, we
#include
header files in C and C++ to pull in declarations from another translation unit and then invoke the compiler/linker on both translation units. But we can also#include
source.cpp
files directly if we wish.
Pass the source code of \(M'\) to \(M\) and have \(M\) simulate the execution of \(M'\). In other words, \(M\) examines the source code for \(M'\) and performs exactly the steps that are specified in that source code. An interpreter is a real-world incarnation of this concept. For example, we can pass the source code of a Python program to a Python interpreter:
$ python prog.py
The Python interpreter reads the source code in
prog.py
, executing each statement contained therein one after the other.
When it comes to Turing machines, this second concept of an interpreter is that of a universal Turing machine. Such a machine \(U\) is given a pair of inputs \((\inner{M}, x)\), where the first input is the source code of some machine \(M\) and the second input is the actual input on which we want to run \(M\). Then \(U\) simulates the execution of \(M\) on \(x\), so that:
If \(M\) accepts \(x\), then \(U\) accepts the pair \((\inner{M}, x)\).
If \(M\) rejects \(x\), then \(U\) rejects the pair \((\inner{M}, x)\).
If \(M\) loops on \(x\), then \(U\) loops on the pair \((\inner{M}, x)\).
There are many examples of universal Turing machines in the literature, and their descriptions can be found elsewhere. For our purposes, what is important is the existence of universal machines, not the details of their implementation.
Since \(U\) is itself a Turing machine, it has a corresponding language \(L(U)\):
Thus, \((\inner{M}, x) \in L(U)\) exactly when \(M\) accepts \(x\). If \(M\) rejects or loops on \(x\), then \((\inner{M}, x) \notin L(U)\).
The language of a universal Turing machine consists of pairs of machines and inputs, where the machine accepts the corresponding input. We use the notation \(\atm\) to represent this language, the language of acceptance (\(\mathrm{ACC}\) is shorthand for “acceptance”).
Now that we have a definition for this interesting language \(\atm\), the obvious next question is whether or not \(\atm\) is decidable. For it to be decidable, there must exist some decider \(C\) that takes a pair of inputs \((\inner{M}, x)\) and has the following behavior:
If \(M\) accepts \(x\), then \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\) (i.e. \(M\) rejects or loops on \(x\)), then \(C\) rejects \((\inner{M}, x)\).
Observe that a universal Turing machine \(U\) does not meet these requirements. In particular:
If \(M\) loops on \(x\), then \(U\) loops on \((\inner{M}, x)\).
If \(M\) loops on \(x\), then \(C\) rejects \((\inner{M}, x)\).
Thus, even though \(L(U) = L(C) = \atm\), \(U\) loops on some inputs and therefore is not a decider, while \(C\) must halt on all inputs. We know that \(U\) exists, but we do not yet know whether or not \(C\) does.
There is no decider for the language \(\atm\).
Assume for the purposes of contradiction that there exists a decider \(C\) for the language \(\atm\). We can then define a new program \(D\) as follows:
Given that \(C\) exists, we can trivially implement \(D\) as a program that invokes \(C\) as a subroutine. Since \(C\) always halts, so does \(D\).
Observe that \(C\) takes the pair of inputs \((\inner{M}, x)\), while \(D\) takes only a single input \(\inner{M}\). \(D\) then invokes \(C\) on the pair \((\inner{M}, \inner{M})\).
By assumption, we have that:
If \(M\) accepts \(\inner{M}\), then \(C\) accepts \((\inner{M}, \inner{M})\).
If \(M\) rejects or loops on \(\inner{M}\), then \(C\) rejects \((\inner{M}, \inner{M})\).
As we discussed previously, it is perfectly valid for a program to take its own source code as input – the source code is just a binary string like any other.
Given the behavior of \(C\) above, we have by the construction of \(D\) that:
If \(M\) accepts \(\inner{M}\), then \(D\) rejects \(\inner{M}\).
If \(M\) rejects or loops on \(\inner{M}\), then \(D\) accepts \(\inner{M}\).
In other words, we have the following two possibilities for a program \(M\):
\(M\) accepts its own source code. Then \(D\) does not accept \(M\)’s source code.
\(M\) does not accept its own source code. Then \(D\) accepts \(M\)’s source code.
Thus, \(D\) accepts the source code of those, and only those, programs that do not accept their own source code.
We now have a situation like the barber paradox: what does \(D\) do when given its own source code as its input? Substituting \(M = D\) gives us the two possibilities:
\(D\) accepts its own source code. Then \(D\) does not accept \(D\)’s source code.
\(D\) does not accept its own source code. Then \(D\) accepts \(D\)’s source code.
Both cases lead to a contradiction! Since we arrive at a contradiction in all cases, our original assumption that the decider \(C\) exists must be incorrect. Thus, no decider exists for \(\atm\), and \(\atm\) is undecidable.
Previously, we demonstrated with diagonalization that some undecidable language exists. We now have shown that \(\atm\) is an explicit example of an undecidable language. We will use that result to show that other, specific languages are undecidable as well.
Recall that the only difference between a universal Turing machine \(U\), which we know to exist, and the nonexistent decider \(C\) is how they behave when given a pair \((\inner{M}, x)\) such that \(M\) loops on \(x\). For inputs \(x\) on which \(M\) halts, \(U\) and \(C\) have the exact same behavior. If we could just predetermine whether \(M\) halts on a given input \(x\), we could use actually construct \(C\) from \(U\). Determining whether a machine \(M\) halts on an input \(x\) is the halting problem. Since we just argued that if we could solve the halting problem, we could also decide \(\atm\), this implies that the halting problem is undecidable as well!
We proceed to formalize this reasoning. Define \(\htm\), the language corresponding to the halting problem, as follows:
We have:
If \(M\) accepts \(x\), then \((\inner{M}, x) \in \htm\).
If \(M\) rejects \(x\), then \((\inner{M}, x) \in \htm\).
If \(M\) loops on \(x\), then \((\inner{M}, x) \notin \htm\).
There is no decider for the language \(\htm\).
Assume for the purposes of contradiction that there exists a decider \(H\) for the language \(\htm\). We can then construct a decider \(C\) for \(\atm\) as follows:
Since this program does little more than run \(H\) as a subroutine and simulate the execution of \(M\), we can trivially construct it if we have a decider \(H\) and an interpreter \(U\). We know that \(U\) exists, so the only assumption we are making here is that \(H\) exists as well.
We now analyze the behavior of \(C\) on \((\inner{M}, x)\):
If \(M\) accepts \(x\), then \((\inner{M}, x) \in \htm\), so \(H\) accepts \((\inner{M}, x)\). The program \(C\) proceeds to run \(M\) on \(x\). \(M\) accepts \(x\), so \(C\) accepts as well.
If \(M\) rejects \(x\), then \((\inner{M}, x) \in \htm\), so \(H\) accepts \((\inner{M}, x)\). The program \(C\) proceeds to run \(M\) on \(x\). \(M\) rejects \(x\), so \(C\) rejects as well.
If \(M\) loops on \(x\), then \((\inner{M}, x) \notin \htm\), so \(H\) rejects \((\inner{M}, x)\). Then \(C\) rejects as well.
Observe that \(C\) accepts \((\inner{M}, x)\) exactly when \(M\) accepts \(x\), and that it rejects \((\inner{M}, x)\) when \(M\) either rejects or loops on \(x\). Thus, \(C\) accepts exactly those inputs that are in \(\atm\) and rejects all other inputs, so \(C\) is a decider for \(\atm\). This contradicts our known fact that \(\atm\) has no decider. Since we have obtained a contradiction, our original assumption that a decider \(H\) exists for \(\htm\) is incorrect, and \(\htm\) is undecidable.
We have demonstrated that \(\htm\) is undecidable; this is quite unfortunate, since it is a fundamental problem in software and hardware design. Given a program, one of the most basic questions we can ask about its correctness is whether it halts on all inputs. However, since \(\htm\) is undecidable, we cannot answer this question in general even for a single input, much less the space of all possible inputs.
Our proof of the undecidability of the halting problem follows a common pattern for showing that a language \(B\) is undecidable:
Assume that \(B\) is decidable. Then it has some decider \(M_B\).
Use \(M_B\) to construct a decider \(M_A\) for a known undecidable language \(A\).
Since \(M_A\) is a decider for \(A\) but we know that \(A\) is undecidable, we have a contradiction. So the assumption that \(B\) is decidable must be false.
This pattern is an instance of reducing problem \(A\) to problem \(B\): we can reduce the task of solving \(A\) to that of solving \(B\), so that if we have a solver for \(B\), we can construct a solver for \(A\). We proceed to formalize this reasoning and apply it to more examples.
Construct an elementary proof that \(\htm\) is undecidable by following the steps below. This was the first “existence” proof for an undecidable language, and it dates back to Alan Turing.
Suppose there exists a Turing Machine \(M\) that decides \(\htm\). Using \(M\), design a Turing Machine \(M'\) such that for any Turing Machine \(T\), \(M'(\inner{T})\) halts if and only if \(M(\inner{T},\inner{T})\) rejects.
Design an input \(x\) for \(M'\) that creates paradoxical behavior, and conclude that \(\htm\) is undecidable.
Reducibility¶
In the real world, almost all programs make use of existing components or libraries to accomplish their intended task. As an example, we take a look at a “hello world” program in C:
#include <stdio.h>
int main() {
printf("Hello world!");
return 0;
}
This program uses the standard stdio.h
library, invoking the
printf()
function located in that library. Observe that we do not
need to know how printf()
works to write this program – we need
only know what it does. As long as the function does what it claims to
do, our program will work correctly, even if the implementation of
printf()
were to be swapped out with a completely different one.
Thus, we treat printf()
as a black box that prints the given
arguments according to the format specification in the C language
standard.
In a sense, our program reduces the task of printing Hello
world!
to that of a generic printf()
function. As long as we
have access to the latter, we can use it to rather trivially
accomplish our task. A reduction then is a transformation of one task
to another, and if a reduction exists from task \(A\) to task \(B\), then
\(A\) is no harder than \(B\). We define this notion formally for
languages as follows:
Language \(A\) is Turing reducible to language \(B\), written \(A \Tr B\), if there exists a program \(M_A\) that decides \(A\) using a black box \(M_B\) that decides \(B\). The definition of \(M_A\) in terms of \(M_B\) is called a Turing reduction from language \(A\) to language \(B\).
The program \(M_A\) can query the black box \(M_B\) any number of times, including zero, one, or multiple times. It can send any input to \(M_B\) – the input to \(M_B\) need not be the same as the input to \(M_A\).
The intuition behind the notation \(A \Tr B\) is that \(A\) is no harder to solve than \(B\), since we know that we can construct a solver for \(A\) from a solver for \(B\). We demonstrated previously that \(\atm \Tr \htm\), since we were able to construct a decider \(C\) for \(\atm\) from a black box \(H\) that decides \(\htm\).
Note that \(A \Tr B\) does not require that a decider for \(B\) actually exist. It is merely a conditional statement that if we do have access to a decider for \(B\), we can go ahead and construct a decider for \(A\) using the decider for \(B\) as a black box. In the case of \(\atm \Tr \htm\), the decider \(H\) for \(\htm\) does not actually exist.
Suppose \(A \Tr B\). Then if \(B\) is decidable, \(A\) is also decidable.
Since \(A \Tr B\), we have by definition that we can construct a decider \(M_A\) for \(A\) that uses a decider for \(B\) as a black box. Since \(B\) is decidable, a decider \(M_B\) does indeed exist for \(B\). Thus, we can plug \(M_B\) in as the black box in our definition for \(M_A\), resulting in a concrete decider for \(A\). Since we now have an actual decider for \(A\), \(A\) is decidable.
The contraposition 7 of Theorem 62 gives us the following:
Suppose \(A \Tr B\). Then if \(A\) is undecidable, \(B\) is also undecidable.
- 7
We can also demonstrate this through contradiction, since contraposition and contradiction are closely related. We have that \(A \Tr B\) and that \(A\) is undecidable. Suppose for the purposes of contradiction that \(B\) is decidable. This means that there is some decider \(M_B\) for \(B\). Since \(A \Tr B\), we can use \(M_B\) as a black box to construct a decider \(M_A\) for \(A\). This contradicts the fact that \(A\) is undecidable, so our assumption that \(B\) is decidable must be false.
This gives us a powerful tool to demonstrate that other languages are undecidable, once we have a known undecidable language. Given that a language \(A\) is known to be undecidable, we just have to show that \(A \Tr B\) to prove that \(B\) is also undecidable. We did exactly that for the halting problem, demonstrating that \(\atm \Tr \htm\).
If we know that a particular language is decidable or undecidable, we can proceed to perform a reduction with that language as either \(A\) or \(B\) in the reduction \(A \Tr B\). Observe that there are four possibilities, but only two give us information about the decidability of the other language:
Known Fact |
Conclusion |
\(A\) decidable |
? |
\(A\) undecidable |
\(B\) undecidable |
\(B\) decidable |
\(A\) decidable |
\(B\) undecidable |
? |
Thus, if we know that a language \(L\) is decidable, we can use it as the right-hand side of a reduction \(A \Tr L\) to show that \(A\) is decidable. On the other hand, if we know that a language \(L\) is undecidable, we can use it as the left-hand side of a reduction \(L \Tr B\) to show that \(B\) is undecidable. The other two possibilities tell us nothing about the decidability of the other language. In fact, if \(A\) is decidable, we can show that \(A \Tr B\) for any language \(B\) 8 – the decider for \(A\) simply ignores the black-box decider for \(B\) and solves \(A\) directly.
- 8
The same is not true for the fourth case, when \(B\) is undecidable – it is not the case that if \(B\) is undecidable, any language \(A\) is reducible to \(B\). There are in fact hierarchies of undecidable languages, but they are beyond the scope of this text.
We saw previously that \(\htm\) is undecidable. We proceed to show that a more restricted form of this language, known as \(\ehtm\), is also undecidable. Define \(\ehtm\) as follows:
Clearly we can decide \(\ehtm\) given a decider for \(\htm\):
This demonstrates that \(\ehtm \Tr \htm\), but as we can see from the table above, that tells us nothing about the decidability of \(\ehtm\). Instead, we must show that an undecidable language is reducible to \(\ehtm\), not the other way around.
We show that \(\ehtm\) is undecidable by performing the Turing reduction \(\htm \Tr \ehtm\).
Suppose that \(E\) is a black box that decides \(\ehtm\). Then \(E\) is given the single input \(\inner{M}\), the source code of a program \(M\). We have:
If \(M\) halts on (accepts or rejects) \(\varepsilon\), then \(E\) accepts \(\inner{M}\).
If \(M\) loops on \(\varepsilon\), then \(E\) rejects \(\inner{M}\).
To show that \(\htm \Tr \ehtm\), we need to use \(E\) as a black box to construct \(H\), a decider for \(\htm\). Specifically, \(H\) is given the pair of inputs \((\inner{M}, x)\), and we require that:
If \(M\) halts on \(x\), then \(H\) accepts \((\inner{M}, x)\).
If \(M\) loops on \(x\), then \(H\) rejects \((\inner{M}, x)\).
We need to write a program that works on two inputs, but the black box we have at our disposal only takes one input. We need to somehow transform our two original inputs into a single input that can be passed to \(E\). A typical strategy is to construct a new program on the fly that hardcodes the two original inputs, transforming them into a program that is suitable to be passed to the black box. The result is as follows:
The newly constructed program \(M'\) takes an input \(w\), but it ignores that input and just runs \(M\) on the original input \(x\). Thus:
If \(M\) accepts \(x\), then \(M'\) accepts all inputs \(w\).
If \(M\) rejects \(x\), then \(M'\) rejects all inputs \(w\).
If \(M\) loops on \(x\), then \(M'\) loops on all inputs \(w\).
If \(M\) accepts or rejects \(x\), then \(M'\) halts on all inputs, which means that it halts on the specific input \(\varepsilon\). On the other hand, if \(M\) loops on \(x\), then \(M'\) loops on all inputs, including the specific input \(\varepsilon\).
Observe that \(H\) merely constructs \(M'\) and passes it as an argument to \(E\). \(H\) does not actually run \(M'\). The result from \(E\) is as follows:
If \(M'\) halts on \(\varepsilon\), then \(E\) accepts \(\inner{M'}\), so \(H\) accepts \((\inner{M}, x)\).
If \(M'\) loops on \(\varepsilon\), then \(E\) rejects \(\inner{M'}\), so \(H\) rejects \((\inner{M}, x)\).
As we pointed out previously, \(M'\) halts on \(\varepsilon\) exactly when \(M\) halts on \(x\), and \(M'\) loops on \(\varepsilon\) exactly when \(M\) loops on \(x\). We thus have:
If \(M\) halts on \(x\), \(M'\) halts on \(\varepsilon\). Then \(E\) accepts \(\inner{M'}\), so \(H\) accepts \((\inner{M}, x)\).
If \(M\) loops on \(x\), \(M'\) loops on \(\varepsilon\). Then \(E\) rejects \(\inner{M'}\), so \(H\) rejects \((\inner{M}, x)\).
Thus, \(H\) accepts \((\inner{M}, x)\) when \(M\) halts on \(x\), and \(H\) rejects \((\inner{M}, x)\) when \(M\) loops on \(x\). This makes \(H\) a decider for the language \(\htm\).
Since we constructed a decider \(H\) for \(\htm\) from a decider \(E\) for \(\ehtm\), we have demonstrated that \(\htm \Tr \ehtm\). Since \(\htm\) is undecidable, it follows that \(\ehtm\) is also undecidable.
- 9
Since
M_source
andx
may contain newlines and other characters that are not allowed in a C++ string literal, we would need to either process the strings to replace such characters with their respective escape sequences or use raw string literals. The following is the relevant line of code using raw literals:+ " return U(R\"@" + M_source + "@\", R\"@" + x + "@\");\n"
Here,
@
represents a sequence of characters that does not appear inM_source
orx
.
Let
We show that \(\lang{FOO}\) is undecidable via the Turing reduction \(\atm \Tr \lang{FOO}\). Suppose that \(F\) is a black box that decides \(\lang{FOO}\). It takes a single input \(\inner{M}\), and we have:
if \(M\) accepts \(foo\), then \(foo \in L(M)\) and \(F\) accepts \(\inner{M}\)
if \(M\) rejects or loops on \(foo\), then \(foo \notin L(M)\) and \(F\) rejects \(\inner{M}\)
We need to construct a decider \(C\) for \(\atm\) that takes two inputs \(\inner{M}\) and \(x\) and:
if \(M\) accepts \(x\), then \(C\) accepts \((\inner{M}, x)\)
if \(M\) rejects or loops on \(x\), then \(C\) rejects \((\inner{M}, x)\)
Passing \(\inner{M}\) directly to \(F\) does not tell us anything about whether \(M\) accepts a given input \(x\). Instead, we construct a new program \(M'\) such that:
if \(M\) accepts \(x\), then \(M'\) accepts \(foo\)
if \(M\) rejects or loops on \(x\), then \(M'\) rejects or loops on \(foo\)
The following is one such program:
Here, by “answer as \(M\)”, we mean accept if \(M\) accepts and reject if \(M\) rejects. Observe that both \(M\) and \(x\) are hardcoded into the source code for \(M'\).
The full construction of the decider \(C\) is as follows:
We analyze \(C\):
If \(M\) accepts \(x\), then \(M'\) accepts \(foo\) and \(foo \in L(M')\). Thus, \(F\) accepts \(\inner{M'}\), and \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(M'\) does not accept \(foo\) and \(foo \notin L(M')\). Thus, \(F\) rejects \(\inner{M'}\), and \(C\) rejects \((\inner{M}, x)\).
Since \(C\) accepts \((\inner{M}, x)\) exactly when \(M\) accepts \(x\), \(C\) is a decider for \(\atm\), and we have shown \(\atm \Tr \lang{FOO}\). We conclude that \(\lang{FOO}\) is undecidable.
Define
This corresponds to the problem of determining whether two programs accept the same set of inputs. We show that \(\eqtm\) is undecidable using the reduction \(\atm \Tr \eqtm\). Suppose \(Q\) is a black-box decider for \(\eqtm\). Then it takes two inputs \(\inner{M_1}\) and \(\inner{M_2}\) and:
if \(L(M_1) = L(M_2)\), then \(Q\) accepts \((\inner{M_1}, \inner{M_2})\)
if \(L(M_1) \ne L(M_2)\), then \(Q\) rejects \((\inner{M_1}, \inner{M_2})\)
A decider \(C\) for \(\atm\) takes two inputs \(\inner{M}\) and \(x\) and accepts exactly when \(M\) accepts \(x\). We do not care how \(M\) behaves on inputs other than \(x\), so we likely do not want to pass \(\inner{M}\) directly to \(Q\). Instead, we construct a new program \(M_1\) such that it has a different language depending on whether \(M\) accepts \(x\). In particular, we will arrange for \(L(M_1) = \Sigma^*\) if \(M\) accepts \(x\), and \(L(M_1) = \varnothing\) if \(M\) does not accept \(x\). The following definition of \(M_1\) does the trick:
We still need a second argument \(M_2\) to pass to \(Q\) – we fix \(M_2\) such that it accepts everything, so that \(L(M_1) = \Sigma^* = L(M_2)\) exactly when \(M\) accepts \(x\). Then the full definition of \(C\) is as follows:
We analyze \(C\):
If \(M\) accepts \(x\), then \(M_1\) accepts all inputs, so \(L(M_1) = \Sigma^* = L(M_2)\). Thus, \(Q\) accepts \((\inner{M_1}, \inner{M_2})\), and \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(M_1\) does not accept any input, so \(L(M_1) = \varnothing \ne L(M_2)\). Thus, \(Q\) rejects \((\inner{M_1}, \inner{M_2})\), and \(C\) rejects \((\inner{M}, x)\).
Thus, \(C\) is a decider for \(\atm\), and we have shown \(\atm \Tr \eqtm\). We conclude that \(\eqtm\) is undecidable.
We have seen that \(\atm \Tr \htm\). We now show that \(\htm \Tr \atm\). Assume that we have a black-box decider \(C\) for \(\atm\). We have:
if \(M\) accepts \(x\), then \(C\) accepts \((\inner{M}, x)\)
if \(M\) rejects \(x\), then \(C\) rejects \((\inner{M}, x)\)
if \(M\) loops on \(x\), then \(C\) rejects \((\inner{M}, x)\)
We need to construct a decider \(H\) for \(\htm\) such that:
if \(M\) accepts \(x\), then \(H\) accepts \((\inner{M}, x)\)
if \(M\) rejects \(x\), then \(H\) accepts \((\inner{M}, x)\)
if \(M\) loops on \(x\), then \(H\) rejects \((\inner{M}, x)\)
The only case where the behavior of \(C\) and \(H\) differs is when \(M\) rejects \(x\); if we were to invoke \(C\) on \((\inner{M}, x)\), we would get the right answer for when \(M\) accepts or loops on \(x\). To use \(C\) to get the correct answer when \(M\) rejects \(x\), we can construct a new program \(M'\) that accepts \(x\) when \(M\) rejects \(x\). There are several ways to do so; here is one:
We can then invoke \(C\) twice, once on \(M\) and once on \(M'\). We define \(H\) as follows:
We analyze \(H\) as follows:
If \(M\) accepts \(x\), then \(C\) accepts \((\inner{M}, x)\), so \(H\) accepts \((\inner{M}, x)\).
If \(M\) rejects \(x\), then \(M'\) accepts all inputs, so \(C\) accepts \((\inner{M'}, x)\) and \(H\) accepts \((\inner{M}, x)\).
If \(M\) loops on \(x\), then \(M'\) loops on all inputs, so \(C\) rejects both \((\inner{M}, x)\) and \((\inner{M'}, x)\). Then \(H\) rejects \((\inner{M}, x)\).
Thus, \(H\) decides \(\htm\), and we have demonstrated that \(\htm \Tr \atm\).
Observe that we invoked the black box \(C\) twice. In general, for the Turing reduction \(A \Tr B\), the decider for \(A\) may call the black-box decider for \(B\) any number of times, including zero or more than once.
Prove that \(\atm \Tr \ehtm\).
Let
Prove that \(\lang{\varepsilon\text{-}ACC}\) is undecidable.
Determine, with proof, whether or not the following language is decidable:
Determine, with proof, whether or not the following language is decidable:
Wang Tiling¶
The examples of undecidable problems we have seen thus far have been concerned with the behavior of Turing machines or programs in general. However, undecidable problems are everywhere – as we discussed previously, there are more undecidable problems than decidable ones. We take a look at undecidability in the context of a geometric problem, that of tiling a plane.
Given a finite set of shapes \(S\), is it possible to tile a plane using only tiles from \(S\)? We are allowed to use as many of each tile as we want, but we are not allowed to overlap them or leave any gaps between the tiles we use. For the purposes of this discussion, we will also disallow rotation of tiles – typically, we can just increase the tile set with rotated copies instead.
An example of a tiling is the following from Study of Regular Division of the Plane with Reptiles by M. C. Escher:
Here, the plane is tiled with a regular pattern using three reptile-shaped tiles.
Another example more akin to a jigsaw puzzle is using the following set of tiles:
We can construct a regular grouping of tiles that fit together:
We can then repeat this pattern to tile the whole plane:
While the two examples above admit tilings that are periodic, it is also possible to tile the plane in a non-periodic manner. The following is an example of a Penrose tiling that does so:
Not all tile sets \(S\) admit tilings of the entire plane. Given a set of tiles \(S\), we would like to use an algorithm to determine whether it is possible to tile the plane with that set. We define the language \(\lang{TILE}\) as follows:
Is \(\lang{TILE}\) decidable? It turns out the answer is no. We can use tiling to simulate a Turing machine, and we can construct a tile set \(S_M\) from a Turing machine \(M\) such that deciding whether \(S_M\) tiles the plane is equivalent to deciding whether \(\inner{M} \in \ehtm\). This means that \(\ehtm \Tr \lang{TILE}\), so that \(\lang{TILE}\) is undecidable.
For the rest of our discussion, we will restrict ourselves to the slightly simpler problem of determining whether a tile set \(S\) can tile a quadrant of the plane rather than a whole plane. Thus, the language we examine will be as follows:
We consider tiles with a particular structure, namely squares with a color on each edge. These are known as Wang tiles.
Two tiles may only be adjacent if the colors match on their shared border. Tiles may not be rotated or flipped. We then pick a color for the boundary of the quadrant; here, we have chosen black for the boundary.
The question then is whether or not a given set of tiles can tile the whole quadrant. For the tile set above, we can see that it cannot. Only tile 2 may be placed in the bottom-left corner, then only tile 3 may be to its right and only tile 1 above. While we can continue placing more copies of tile 1 in the upward direction, no tile has a left boundary that is colored red, so we cannot fill out the rest of the quadrant to the right.
While we were able to reason that in this specific case, the quadrant cannot be tiled with the given set, we cannot do so in general with any computational process. To demonstrate this, we will construct a tile set that simulates the execution of a Turing machine. We will illustrate this on a single, simple machine, but the same process can be done for any machine.
Before proceeding to our construction, we define a notation for representing the execution state, or configuration, of a Turing machine. We need to capture several details that uniquely describe the configuration of the machine:
the state \(q \in Q\) in which the Turing machine is
the contents of the tape
the position of the head
We represent these details with an infinite string over the alphabet \(\Gamma \cup Q\). Specifically, the string contains the full contents of the tape, plus the state \(q \in Q\) immediately to the left of the symbol for the cell at which the head is located. Thus, if the input to a machine is \(s_1 s_2 \dots s_n\) and the initial state is \(q_0\), the following encodes the starting configuration of the machine:
Since the head is at the leftmost cell and the state is \(q_0\), the string has \(q_0\) to the left of the leftmost tape symbol. Then if the transition function gives \(\delta(q_0, s_1) = (q', s', R)\), the new configuration is:
The first cell has been modified to \(s'\), the machine is in state \(q'\), and the head is over the second cell, represented here by writing \(q'\) to the left of that cell’s symbol.
As a more concrete example, the configuration of the machine we saw previously that accepts strings containing only ones, when run on the input \(111010111\), is as follows at each step:
We will encode such a configuration string using an infinite sequence of tiles, where the symbols along the top edges of the tiles comprise the configuration.
We consider a simple machine that just alternately writes two different symbols to the tape. Given an initial blank tape (i.e. \(\varepsilon\) as the input), the machine will write symbols indefinitely, never terminating.
We start constructing our tile set by introducing a tile for each symbol in the tape alphabet. Each tile is “colored” by the symbol at the top and bottom, and the left and right edges are colored blank.
We then examine the transition function. For each leftward transition
and each element of the tape alphabet \(s \in \Gamma\), we introduce the following pair of tiles:
This represents part of the configuration when applying the transition. The bottom of the right tile indicates that we are originally in state \(p\) with the head over the symbol \(i\), and the bottom of the left tile indicates there is a symbol \(s\) in the cell to the left. The top of the tiles represents the result of the transition. The new state is \(q\) and the head moves to the left, which we indicate by the top of the left tile. The symbol in the right tile is overwritten with \(j\), which we indicate at the top of the right tile. Finally, we label the adjacent edges with \(q\) to enable the two tiles to be adjacent in our tiling.
Similarly, for each rightward transition
and tape symbol \(s\), we introduce the following pair of tiles:
Here, the head moves to the right, so we see that the original state \(p\) appears at the bottom in the left tile and the new state \(q\) appears at the top in the right tile.
For our simple Turing machine above, we only have two rightward transitions and three tape symbols, so the resulting transition tiles are those below:
We have repeated tiles here, but since we are constructing a set, the duplicates are consolidated into a single copy.
Finally, we introduce start tiles to encode the initial state of the machine. Since we are interested in the empty string as input, only blank symbols appear in our start tiles. We have a corner tile that encodes the start state and the initial head position all the way at the left, and we have a bottom tile that encodes the remaining blank cells:
The full set of tiles for the simple Turing machine is as follows, with duplicates removed:
Now that we have fleshed out our tile set, we can proceed to attempt to tile the quadrant. The quadrant boundary matches with an empty tile edge. The only tile that can go in the corner is our corner start tile.
Once we place the corner tile, the only tile that can go to the right of it is a bottom start tile. We can place copies of this tile indefinitely to the right.
The resulting bottom row encodes the initial state of the machine: every cell contains the blank symbol (since the input is \(\varepsilon\)), the initial state is \(q_0\), and the head is on the leftmost cell. The position of the head is indicated by which tile has both a state and a symbol in its top edge.
Now that we have the bottom row, we look for tiles that can go above that row. Only one of our tiles has \((q_0, \bot)\) at its bottom, and it is one of the tiles for the transition \(\delta(q_0, \bot) = (q_1, a, R)\). This forces us to use one of the other tiles for that transition as well, and since the symbol to the right is \(\bot\), we use the corresponding transition tile that has \(\bot\) at the bottom. The remaining tiles in the row can be symbol tiles for \(\bot\).
The configuration represented in the second row above is the machine having written \(a\) in the first cell, being in the state \(q_1\), and with the head on the second cell. This is exactly the result of the transition.
For the next row, we have to use a symbol tile for \(a\) at the left. Then we have to use transition tiles for \(\delta(q_1, \bot) = (q_0, b, R)\). Finally, we have to fill out the rest of the row with symbol tiles for \(\bot\).
Again, the configuration encoded here corresponds exactly with the state of the machine after two steps. We can proceed to do the next step as well.
Since each row corresponds to a step of the Turing machine, the question of whether or not the quadrant can be tiled is the same as whether the machine performs infinitely many steps, i.e. whether it loops. Since the input we are working on here is \(\varepsilon\), we can tile the quadrant exactly when the machine \(M\) loops on \(\varepsilon\).
The following construction formalizes our proof of undecidability of \(\lang{QTILE}\) by performing the reduction \(\ehtm \Tr \lang{QTILE}\). Assume we have a decider \(T\) for \(\lang{QTILE}\). We can then construct a decider \(E\) for \(\ehtm\):
We then have the following cases:
If \(M\) halts on \(\varepsilon\), \(T\) rejects \(\inner{S}\), so \(E\) accepts \(\inner{M}\).
If \(M\) loops on \(\varepsilon\), \(T\) accepts \(\inner{S}\), so \(E\) rejects \(\inner{M}\).
Thus, \(E\) is a decider for \(\ehtm\), and we have demonstrated that \(\ehtm \Tr \lang{QTILE}\). Since \(\ehtm\) is undecidable, so is \(\lang{QTILE}\).
Modify the construction we used in the reduction \(\ehtm \Tr \lang{QTILE}\) to show instead that \(\htm \Tr \lang{QTILE}\).
Hint: Only the set of start tiles needs to be changed, and the set will be different for different inputs.
Recognizability¶
We have seen several examples of undecidable languages, including \(\atm\), \(\htm\), and \(\ehtm\). As we discussed before, for a machine \(M\) to decide a language \(A\), it must halt on every input:
If \(x \in A\), \(M\) accepts \(x\).
If \(x \notin A\), \(M\) rejects \(x\).
We also saw that not all machines are deciders. If a machine \(M\) is not a decider, it can still be associated with a language \(A\) if it accepts all inputs in \(A\) and does not accept any inputs not in \(A\):
If \(x \in A\), \(M\) accepts \(x\).
If \(x \notin A\), \(M\) rejects or loops on \(x\).
If this is the case, then \(L(M) = A\) and we say that \(M\) recognizes the language \(A\). Since every machine \(M\) has an associated language \(L(M)\), every machine \(M\) is a recognizer for its own language \(L(M)\). But a machine is only a decider if it halts on every input.
A language \(A\) is recognizable if there exists some machine that recognizes \(A\), i.e. there exists a machine \(M\) such that \(L(M) = A\). A language may be recognizable but undecidable. In fact, we saw that the language of a universal Turing machine (an interpreter) \(U\) is \(L(U) = \atm\). Thus, \(\atm\) is recognizable.
Are all undecidable languages recognizable? In other words, if we give up the requirement of always halting, is there a machine associated with every possible language? Unfortunately, we have already seen this is not the case – we constructed a language \(A\) that is not the language of any machine. Thus, we proved that there exist unrecognizable languages. However, our proof was not constructive, so we have yet to see a concrete example of an unrecognizable language.
How do we show that a language \(L\) is recognizable? Similar to how we show that a language is decidable, we need only write a recognizer for \(L\). We have already seen that the universal Turing machine is a recognizer for \(\atm\). We proceed to construct a recognizer for \(\htm\):
We need to show that if \((\inner{M}, x) \in \htm\), then \(H_r\) accepts \((\inner{M}, x)\), otherwise \(H_r\) rejects or loops on \((\inner{M}, x)\).
If \((\inner{M}, x) \in \htm\), then \(M\) halts on \(x\). In this case, the simulation of \(M\) on \(x\) in \(H_r\) terminates, and \(H_r\) proceeds to accept \((\inner{M}, x)\).
If \((\inner{M}, x) \notin \htm\), then \(M\) loops on \(x\). In this case, the simulation of \(M\) on \(x\) does not terminate, so \(H_r\) loops on \((\inner{M}, x)\).
Thus, \(H_r\) correctly recognizes \(\htm\).
As another example, suppose that \(A_1\) and \(A_2\) are recognizable languages. Is \(B = A_1 \cup A_2 = \{x : x \in A_1 \ort x \in A_2\}\) recognizable? In other words, is the class of recognizable languages closed under the union operation?
We previously demonstrated that the class of decidable languages is closed under union. We proceed to follow a similar strategy here. Since \(A_1\) and \(A_2\) are recognizable, there exist machines \(M_1\) and \(M_2\) that recognize \(A_1\) and \(A_2\), respectively. We construct a machine \(M\) to recognize \(A_1 \cup A_2\):
Unfortunately, this construction does not work when \(M_1\) and \(M_2\) are not deciders. For instance, if \(M_1\) were to loop on \(x\), we would never get to running \(M_2\) on \(x\). If it were the case that \(M_2\) accepts \(x\), \(M\) would never get around to accepting \(x\) since it would be stuck running the non-terminating execution of \(M_1\) on \(x\).
The problem here is similar to what we encountered in showing that the set of integers is countable. Our attempt to do so by listing all the nonnegative integers first failed because we would never exhaust the nonnegative integers to get to the negative integers. The solution was to interleave the nonnegative integers with the negative integers, so that we would eventually reach any specific element. We can follow a similar strategy here of alternation, interleaving the execution of \(M_1\) and \(M_2\):
How can we alternate between the execution of two programs \(M_1\) and \(M_2\)? If we are simulating their execution, this means running one step of \(M_1\) (e.g. applying its transition function once), then one step of \(M_2\), then another step of \(M_1\), and so on. On an actual computer, we can run a program step-by-step through a debugger. Similarly, real computers simulate parallel execution on sequential processing units by rapidly switching between programs. Thus, we can alternate between programs both in theory and in practice.
We proceed to analyze the behavior of \(M\):
If \(x \in A_1\), then \(M_1\) accepts \(x\) since \(M_1\) is a recognizer for \(A_1\). Since \(M\) alternates between the executions of \(M_1\) and \(M_2\) on \(x\), it will eventually get to the point where \(M_1\) accepts (or \(M_2\) accepts first), so \(M\) accepts \(x\).
If \(x \notin A_1\), then \(M_1\) rejects or loops on \(x\). There are two cases depending on whether \(x\) is or isn’t in \(A_2\):
If \(x \in A_2\), \(M_2\) accepts \(x\). Since \(M\) alternates between executing \(M_1\) and \(M_2\), it will eventually reach a point where \(M_2\) accepts \(x\), and \(M\) will also accept \(x\).
If \(x \notin A_2\), \(M_2\) rejects or loops on \(x\). If both \(M_1\) and \(M_2\) reject \(x\), then \(M\) proceeds to reject \(x\). If either of \(M_1\) or \(M_2\) loops on \(x\), then \(M\) will never finish simulating execution of that machine, so \(M\) will loop on \(x\). In either case, \(M\) rejects or loops on \(x\).
Thus, \(M\) accepts all inputs that are in \(A_1\) or \(A_2\), and it rejects or loops on an input \(x\) if \(x \notin A_1 \wedge x \notin A_2\). Since \(B = A_1 \cup A_2\), this means that \(M\) accepts all inputs in \(B\) and rejects or loops on all inputs not in \(B\), so \(M\) is a recognizer for \(B\). Since there exists a machine that recognizes \(B\), \(B\) is a recognizable language.
We conclude that the class of recognizable languages is closed under union, and we can similarly show that it is closed under intersection.
Show that the class of recognizable languages is closed under intersection.
Is the class of recognizable languages closed under complement? In other words, if \(A\) is a recognizable language, is \(\overline{A}\) recognizable?
If a language \(A\) and its complement \(\overline{A}\) are both recognizable, then \(A\) is decidable.
Suppose both \(A\) and \(\overline{A}\) are recognizable. Then there exist machines \(M_1\) and \(M_2\) that recognize \(A\) and \(\overline{A}\), respectively. Since \(M_1\) recognizes \(A\), we have:
If \(x \in A\), then \(M_1\) accepts \(x\).
If \(x \notin A\), then \(M_1\) rejects or loops on \(x\).
Similarly, since \(M_2\) recognizes \(\overline{A}\), we have:
If \(x \in A\), then \(M_2\) rejects or loops on \(x\).
If \(x \notin A\), then \(M_2\) accepts on \(x\).
This follows from the fact that \(A\) and \(\overline{A}\) partition the universe \(\Sigma^*\), so \(x \in A\) means \(x \notin \overline{A}\) and \(x \notin A\) means \(x \in \overline{A}\).
We proceed to construct a decider \(M\) for \(A\):
Then there are the following cases:
If \(x \in A\), then \(M_1\) accepts \(x\), so the simulation of \(M_1\) on \(x\) will eventually accept. On the other hand, \(M_2\) must reject or loop on \(x\), so \(M_2\) will not accept \(x\). Thus, only the first conditional in \(M\) applies, and \(M\) accepts \(x\).
If \(x \notin A\), then \(M_2\) accepts \(x\), so the simulation of \(M_2\) on \(x\) will eventually accept. On the other hand, \(M_1\) must reject or loop on \(x\), so \(M_1\) will not accept \(x\). Thus, only the second conditional in \(M\) applies, and \(M\) rejects \(x\).
Since \(M\) accepts all inputs \(x \in A\) and rejects all inputs \(x \notin A\), \(M\) is a decider for \(A\), and \(A\) is a decidable language.
The contraposition of Claim 74 is that if a language \(A\) is undecidable, then at least one of \(A\) and \(\overline{A}\) is unrecognizable. This gives us a tool to demonstrate that a language is unrecognizable.
If a language \(A\) is undecidable but recognizable, then \(\overline{A}\) is unrecognizable.
We can immediately conclude that
and
are unrecognizable. Since their complements are recognizable, we refer to these languages as co-recognizable.
A language \(L\) is co-recognizable if \(\overline{L}\) is recognizable.
Since the class of decidable languages is closed under complement and all decidable languages are recognizable, a decidable language is both recognizable and co-recognizable. In fact, Claim 74 demonstrates that the class of decidable languages is exactly the intersection of the recognizable and co-recognizable languages 10.
Since every program \(M\) recognizes exactly one language \(L(M)\), it “co-recognizes” exactly one language \(\overline{L(M)}\). As we discussed previously, the set of programs is countable, and since each program corresponds to only a single recognizable and co-recognizable language, the set of recognizable and co-recognizable languages is also countable. As the set of languages is uncountable, most languages are neither recognizable nor co-recognizable.
- 10
The class of recognizable languages is denoted \(\RE\) (short for recursively enumerable, an equivalent definition of recognizable), and the class of co-recognizable languages is denoted \(\coRE\). The class of decidable languages is denoted \(\RD\) (short for recursive). We have \(\RE \cap \coRE = \RD\). It is possible for a language to be neither in \(\RE\) nor in \(\coRE\), but how to prove this is beyond the scope of this text.
Define the langage
We demonstrate that \(\lang{NO\text{-}FOO}\) is unrecognizable.
We start by showing that it is undecidable via the reduction \(\atm \Tr \lang{NO\text{-}FOO}\). Assume we have a decider \(F\) for \(\lang{NO\text{-}FOO}\). We construct a decider \(C\) for \(\atm\) as follows:
We analyze \(C\) as follows:
If \(M\) accepts \(x\), then \(M'\) accepts all inputs, including \(foo\), so \(foo \in L(M)\) and \(F\) rejects \(\inner{M'}\). Then \(C\) accepts \((\inner{M}, x)\).
If \(M\) rejects \(x\), then \(M'\) rejects all inputs, including \(foo\), so \(foo \notin L(M)\) and \(F\) accepts \(\inner{M'}\). Then \(C\) rejects \((\inner{M}, x)\).
If \(M\) loops on \(x\), then \(M'\) loops on all inputs, including \(foo\), so \(foo \notin L(M)\) and \(F\) accepts \(\inner{M'}\). Then \(C\) rejects \((\inner{M}, x)\).
Thus, \(C\) is a decider for \(\atm\). We have demonstrated that \(\atm \Tr \lang{NO\text{-}FOO}\), and we conclude that \(\lang{NO\text{-}FOO}\) is undecidable.
We now demonstrate that
is recognizable. The following program recognizes it:
We have:
If \(M\) accepts \(foo\), then \(R\) accepts \(\inner{M}\).
If \(M\) rejects \(foo\), then \(R\) rejects \(\inner{M}\).
If \(M\) loops on \(foo\), then \(R\) loops on \(\inner{M}\).
We thus have that \(R\) accepts \(\inner{M}\) when \(\inner{M} \in \overline{\lang{NO\text{-}FOO}}\), and \(R\) rejects or loops on \(\inner{M}\) when \(\inner{M} \notin \overline{\lang{NO\text{-}FOO}}\). Thus, \(R\) is a recognizer for \(\overline{\lang{NO\text{-}FOO}}\), and \(\overline{\lang{NO\text{-}FOO}}\) is recognizable.
Since \(\lang{NO\text{-}FOO}\) is undecidable, its complement is also undecidable as the class of decidable languages is closed under complement. Then we have that \(\overline{\lang{NO\text{-}FOO}}\) is undecidable but recognizable, so we conclude from Corollary 76 that its complement \(\lang{NO\text{-}FOO}\) is unrecognizable.
Define the langage
The same construction as for \(\lang{NO\text{-}FOO}\) shows that \(\atm \Tr \lang{NO\text{-}FOO\text{-}BAR}\), so \(\lang{NO\text{-}FOO\text{-}BAR}\) is undecidable.
We demonstrate that \(\lang{NO\text{-}FOO\text{-}BAR}\) is unrecognizable by showing that its complement
is recognizable. The following is a recognizer for it:
We have:
If \(M\) accepts \(foo\) or accepts \(bar\), then \(R\) accepts \(\inner{M}\).
If \(M\) rejects \(foo\) and rejects \(bar\), then \(R\) rejects \(\inner{M}\).
If \(M\) does not accept either \(foo\) or \(bar\) and loops on at least one of the two, then \(R\) loops on \(\inner{M}\).
We thus have that \(R\) accepts \(\inner{M}\) when \(\inner{M} \in \overline{\lang{NO\text{-}FOO\text{-}BAR}}\), and \(R\) rejects or loops on \(\inner{M}\) when \(\inner{M} \notin \overline{\lang{NO\text{-}FOO\text{-}BAR}}\). Thus, \(R\) is a recognizer for \(\overline{\lang{NO\text{-}FOO\text{-}BAR}}\), and \(\overline{\lang{NO\text{-}FOO\text{-}BAR}}\) is recognizable. We conclude from Corollary 76 that \(\lang{NO\text{-}FOO\text{-}BAR}\) is unrecognizable.
Dovetailing¶
As another example of an unrecognizable language, define \(\etm\) as:
In other words, \(\etm\) is the set of source codes for programs that do not accept any input.
Note that \(\etm\) is not the same language as \(\emptyset = \{\}\). The latter is the language that contains no elements, and the following is a decider for it:
\(\etm\) on the other hand is a language that contains the source codes of programs \(M\) such that \(L(M) = \emptyset\), i.e. \(M\) does not accept any inputs. There are infinitely many such programs; the following is another example distinct from \(D_\emptyset\):
\(M'\) is not a decider since it may loop, but it is still the case that \(L(M') = \emptyset\) since \(M'\) does not accept any inputs. Thus, \(M' \in \etm\).
We proceed to show that \(\etm\) is unrecognizable. First, we must demonstrate that \(\etm\) is undecidable. We do so with the reduction \(\atm \Tr \etm\). Given a decider \(N\) for \(\etm\), we construct a decider \(C\) for \(\atm\):
We analyze this program as follows:
If \(M\) accepts \(x\), then \(M'\) accepts all inputs \(w\), so \(L(M') \ne \emptyset\) and \(\inner{M'} \notin \etm\). Thus, \(N\) rejects \(\inner{M'}\), in which case \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(M'\) does not accept any inputs \(w\), so \(L(M') = \emptyset\) and \(\inner{M'} \in \etm\). Then \(N\) accepts \(\inner{M'}\), in which case \(C\) rejects \((\inner{M}, x)\).
Observe that \(C\) always halts – it just constructs the new program \(M'\) and runs \(N\) on it, and the latter halts because it is a decider. Thus, \(C\) is a decider, and the language it decides is \(\atm\). This demonstrates that \(\atm \Tr \etm\), and since \(\atm\) is undecidable, \(\etm\) is also undecidable.
Since the class of decidable languages is closed under complement, we can conclude that
is also undecidable.
We now define a recognizer \(R\) for \(\overline{\etm}\). We require the following behavior:
If \(M\) accepts at least one input \(x\), then \(\inner{M} \in \overline{\etm}\), so \(R\) must accept \(\inner{M}\).
If \(M\) does not accept any inputs \(x\), then \(\inner{M} \notin \overline{\etm}\), so \(R\) must reject or loop on \(\inner{M}\).
We don’t know a priori which inputs \(M\) accepts. If we want to determine whether \(M\) accepts at least one input in some finite set \(S = \{w_1, \dots, w_n\}\), we can alternate between executing \(M\) on each element of \(S\) until one of those executions accepts, similar to how we showed that the class of recognizable languages is closed under union. Unfortunately, we do not have a finite set of possible inputs here; rather, we need to determine whether \(M\) accepts an input in the countably infinite set \(\Sigma^*\).
More explicitly, we need to be able to simulate infinitely many inputs \(w \in \Sigma^*\) on infinitely many steps \(s \in \Z^+\). We have the product \(\Sigma^* \times \Z^+\) of two countably infinite sets, and we need to come up with an ordering such that we will eventually reach each element \((w, s) \in \Sigma^* \times \Z^+\) of this product. We can follow a similar ordering as we did when listing the pairs of natural numbers, as illustrated below:
This process is called dovetailing, and it allows us to alternate between a countably infinite set of executions. Our simulation proceeds in rounds, and in round \(i\), we perform a single step of each execution whose index is at most \(i\). To make this concrete, we define the recognizer \(R\) as:
In the first round, \(R\) simulates a step of the execution of \(M\) on the first input \(w_1\). In the second round, \(R\) simulates a step on \(w_1\) and another on \(w_2\). In the third round, a step on each of \(w_1\), \(w_2\), and \(w_3\) gets simulated, and so on. A round \(i\) executes a finite number of steps \(i\). An arbitrary input \(w_k\) begins execution in round \(k\), i.e. after a finite number of steps, and its execution will continue in subsequent rounds until it halts, or indefinitely if it loops. Thus, this dovetailing process will consider all inputs \(w\). \(R\) only terminates when the simulation of \(M\) on a particular input \(w\) terminates in the accept state.
We analyze \(R\) as follows:
If \(\inner{M} \in \overline{\etm}\), then \(M\) accepts some input \(w_j\) in a finite number of steps \(s\). \(R\) simulates execution of \(M\) on \(w_j\) in rounds \([j, j+s)\), and each round takes a finite amount of time. Thus, \(R\) completes its simulation of \(M\) on \(w_j\) in finite time and subsequently accepts \(\inner{M}\).
If \(\inner{M} \notin \overline{\etm}\), then \(M\) does not accept any inputs. Then \(R\) never terminates – it simulates \(M\) on infinitely many inputs \(w\), none of which accept, so \(R\) does not accept. Thus, \(R\) loops on \(\inner{M}\).
Since \(R\) accepts all \(\inner{M} \in \overline{\etm}\) and loops on all \(\inner{M} \notin \overline{\etm}\), \(R\) is a recognizer for \(\overline{\etm}\). Then because \(\overline{\etm}\) is undecidable but recognizable, it follows that \(\etm\) is unrecognizable.
Prove whether the following language is recognizable, co-recognizable, or both:
Prove whether the following language is recognizable, co-recognizable, or both:
Rice’s Theorem¶
We previously demonstrated that \(\etm\) is undecidable, where
Let’s take a look at a similar language and determine whether it too is undecidable. Define \(\lang{\{\varepsilon\}}\) as
As we saw with \(\etm\) and \(\emptyset = \{\}\), the languages \(\lang{\{\varepsilon\}}\) and \(\{\varepsilon\}\) are distinct. The latter is a language containing only the single element \(\varepsilon\), and it is decidable:
On the other hand, \(\lang{\{\varepsilon\}}\) is a language containing source codes of programs, and it is a countably infinite set. We demonstrate that \(\lang{\{\varepsilon\}}\) is undecidable with the reduction \(\atm \Tr \lang{\{\varepsilon\}}\). Assume we have a decider \(E\) for \(\lang{\{\varepsilon\}}\). Then we construct a decider \(C\) for \(\atm\) as follows:
We analyze this program as follows:
If \(M\) accepts \(x\), then \(M'\) accepts the input \(\varepsilon\) and rejects all other inputs, so \(L(M') = \{\varepsilon\}\) and \(\inner{M'} \in \lang{\{\varepsilon\}}\). Thus, \(E\) accepts \(\inner{M'}\), in which case \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(M'\) does not accept any inputs \(w\), so \(L(M') = \emptyset \ne \{\varepsilon\}\) and \(\inner{M'} \notin \lang{\{\varepsilon\}}\). Then \(E\) rejects \(\inner{M'}\), in which case \(C\) rejects \((\inner{M}, x)\).
Thus, \(\atm \Tr \lang{\{\varepsilon\}}\), and \(\lang{\{\varepsilon\}}\) is undecidable.
We can generalize this reasoning as follows.
Let \(A\) be a recognizable language. Define the language
comprised of source codes of programs \(M\) such that \(L(M) = A\). Then \(L_A\) is undecidable.
We first observe that since \(A\) is recognizable, there exist programs \(M\) such that \(L(M) = A\). In fact, there are infinitely many such programs – given a Turing machine \(M\) that recognizes \(A\), we can construct a distinct Turing machine \(M'\) that also recognizes \(A\). If \(Q\) is the set of states of \(M\), we can construct \(M'\) by adding a new state \(q'\) to \(Q\) without any incoming transitions. Since there are no incoming transitions to \(q'\), the addition of \(q'\) does not affect the behavior of the machine.
Since \(L_A\) has infinitely many elements, we cannot decide it by just hardcoding its elements. In fact, we demonstrate that \(L_A\) is undecidable by reducing \(\atm\) to it. Assume we have a decider \(D\) for \(L_A\). Then we construct a decider \(C\) for \(\atm\). Since \(A\) is recognizable, we know that there exists a recognizer \(R\) for it, which we use in our construction of \(C\). For the purposes of our construction, we assume that \(A \ne \emptyset\), as we have already shown that \(\etm\) is undecidable.
We first analyze the behavior of \(M'\):
If \(M\) accepts \(x\), we have:
If \(w \in A\), then \(R\) accepts \(w\), so \(M'\) accepts \(w\).
If \(w \notin A\), then \(R\) either rejects or loops on \(w\), so \(M'\) rejects or loops on \(w\).
If \(M\) does not accept \(x\), then \(M'\) does not accept any input \(w\) – \(M'\) loops on all inputs if \(M\) loops on \(x\), otherwise \(M'\) rejects all inputs.
Thus, we see that \(L(M') = A\) exactly when \(M\) accepts \(x\); otherwise \(L(M') = \emptyset\). (Again, we are assuming that \(A \ne \emptyset\).) Then the behavior of \(C\) is as follows:
If \(M\) accepts \(x\), then \(L(M') = A\) so \(D\) accepts \(\inner{M'}\). Then \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(L(M') = \emptyset \ne A\) so \(D\) rejects \(\inner{M'}\). Then \(C\) rejects \((\inner{M}, x)\).
Thus, \(C\) is a decider for \(\atm\), and we have shown that \(\atm \Tr L_A\).
The construction above only works for languages \(A\) that are recognizable. If \(A\) is unrecognizable, then \(L_A\) is decidable – since \(A\) is unrecognizable, there is no program \(M\) such that \(L(M) = A\). Thus, \(L_A = \emptyset\), which is decided by a program that rejects all inputs.
We can further generalize Claim 82 by considering sets of languages rather than just a single language in isolation.
A semantic property is a set of languages.
We often use the symbol \(\Prop\) to represent a semantic property. The following is an example of a semantic property:
\(\Prop_\infty\) is the set of languages that have infinitely many elements. Examples of languages contained in \(\Prop_\infty\) include \(\Sigma^*\), \(\{x \in \Sigma^* : x \text{ contains no $1$'s}\}\) (assuming \(\Sigma\) is not the unary alphabet \(\{1\}\)), and all undecidable languages, and examples that are not in \(\Prop_\infty\) include \(\emptyset\) and \(\{x \in \Sigma^* : \abs{x} < 100\}\).
A semantic property \(\Prop\) is trivial if it either contains all recognizable languages, or if it contains no recognizable languages.
Examples of trivial properties include the empty set, the set of all recognizable languages, the set of all unrecognizable languages, and the set of all languages. The set of decidable languages is nontrivial, since there are both recognizable languages that are decidable as well as recognizable languages that are undecidable.
We now generalize our notion of a language that contains source codes of programs, from programs that recognize a specific language \(A\) to programs that recognize a language contained in a semantic property \(\Prop\). Define \(\lang{\Prop}\) as follows:
Whether or not \(\lang{\Prop}\) is decidable depends on whether \(\Prop\) is trivial.
If \(\Prop\) is a trivial semantic property, then \(\lang{\Prop}\) is decidable.
If \(\Prop\) is trivial, then it either contains all recognizable languages or no recognizable languages.
Case 1: \(\Prop\) contains all recognizable languages. Then every program \(M\) is contained in \(\lang{\Prop}\) – since \(L(M)\) is recognizable by definition, \(L(M) \in \Prop\). We can then write a decider for \(\lang{\Prop}\) that accepts all programs as input.
Case 2: \(\Prop\) contains no recognizable languages. Then no program \(M\) is contained in \(\lang{\Prop}\), since \(L(M)\) is recognizable and therefore not in \(\Prop\). Thus, we can decide \(\lang{\Prop}\) by rejecting all inputs.
We now take a look at Rice’s theorem, which states that \(\lang{\Prop}\) is undecidable for a nontrivial semantic property \(\Prop\).
If \(\Prop\) is a nontrivial semantic property, then \(\lang{\Prop}\) is undecidable.
We show that \(\atm \Tr \lang{\Prop}\): given a decider \(D\) for \(\lang{\Prop}\), we will construct a decider \(C\) for \(\atm\).
First, we consider the case where \(\emptyset \notin \Prop\). Since \(\Prop\) is nontrivial, there must be some recognizable language \(A \in \Prop\). Observe that \(A \ne \emptyset\) since \(A \in \Prop\) while \(\emptyset \notin \Prop\). Since \(A\) is recognizable, there is a recognizer \(R_A\) for it. Then we construct \(C\) as follows:
Similar to our analysis in Proof 83, we have that \(L(M') = A\) when \(M\) accepts \(x\), and \(L(M') = \emptyset\) when \(M\) does not accept \(x\). Then the behavior of \(C\) is as follows:
If \(M\) accepts \(x\), then \(L(M') = A\). Since \(A \in \Prop\), we have \(L(M') \in \Prop\) and \(\inner{M'} \in \lang{\Prop}\). \(D\) is a decider for \(\lang{\Prop}\), so it accepts \(\inner{M'}\). Then \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(L(M') = \emptyset \notin \Prop\). Then \(D\) rejects \(\inner{M'}\), so \(C\) rejects \((\inner{M}, x)\).
We now consider the case where \(\emptyset \in \Prop\). Since \(\Prop\) is nontrivial, there must be some recognizable language \(B \notin \Prop\). Observe that \(B \ne \emptyset\) since \(B \notin \Prop\) while \(\emptyset \in \Prop\). Since \(B\) is recognizable, there is a recognizer \(R_B\) for it. Then we construct \(C\) as follows:
We now have that \(L(M') = B\) when \(M\) accepts \(x\), and \(L(M') = \emptyset\) when \(M\) does not accept \(x\). Then the behavior of \(C\) is as follows:
If \(M\) accepts \(x\), then \(L(M') = B\). Since \(B \notin \Prop\), we have \(L(M') \notin \Prop\) and \(\inner{M'} \notin \lang{\Prop}\). \(D\) is a decider for \(\lang{\Prop}\), so it rejects \(\inner{M'}\). Then \(C\) accepts \((\inner{M}, x)\).
If \(M\) does not accept \(x\), then \(L(M') = \emptyset \in \Prop\). Then \(D\) accepts \(\inner{M'}\), so \(C\) rejects \((\inner{M}, x)\).
In both cases, \(C\) is a decider for \(\atm\). We have shown that \(\atm \Tr \lang{\Prop}\), so \(\lang{\Prop}\) is undecidable.
Rice’s Theorem and Program Analysis¶
Rice’s theorem has profound implications for compilers and program analysis. While the formulation above is with respect to the language of a program, the typical expression of Rice’s theorem in the field of programming languages is more expansive:
Any nontrivial question about the behavior of a program at runtime is undecidable.
Here, “nontrivial” means there are some programs that have the behavior in question and others that do not 11.
- 11
The behavior in question must be concerned with program meaning, or semantics, and it must be robust with respect to semantics-preserving transformations. For instance, a question such as, “Does the program halt in less than a thousand steps?” is not about the program’s meaning, and the answer is different for different programs that have identical semantics.
As an example, let’s consider the question of whether a Turing machine ever writes the symbol \(1\) to its tape, when run on a given input \(x\). The language we want to decide is as follows:
This language is not in the form required by our formulation of Rice’s theorem. However, we can show it is undecidable by the reduction \(\htm \Tr \lang{WritesOne}\). Given a decider \(W\) for \(\lang{WritesOne}\), we can construct a decider \(H\) for \(\htm\).
First, we observe that we can construct a new program \(M'\) such that \(M'\) writes a \(1\) to its tape exactly when \(M\) halts. Start by replacing the symbol \(1\) with a new symbol \(\hat{1}\) everywhere in \(M\) – in its input alphabet, tape alphabet, and all occurrences in its transition function. Then replace \(\qacc\) and \(\qrej\) with new states \(q_a'\) and \(q_r'\) such that \(\delta(q_a', \gamma) = (q_a', 1, R)\) and \(\delta(q_r', \gamma) = (q_r', 1, R)\) for all \(\gamma\) in the modified tape alphabet. Then \(M'\) reaches \(q_a'\) or \(q_r'\) exactly when \(M\) halts, and \(M'\) proceeds to write a \(1\) in the next step. Thus, \(M'\) writes a \(1\) exactly when \(M\) halts 12.
Using the above transformation, we can write \(H\) as follows:
Since \(M'\) writes a \(1\) exactly when \(M\) halts, \(W\) accepts \((\inner{M'}, x)\) exactly when \(M\) halts on \(x\). Thus, \(H\) is a decider for \(\htm\), and we have shown that \(\htm \Tr \lang{WritesOne}\).
A similar construction can be done for any nontrivial question about program behavior, with respect to any Turing-complete language. The implication is that there is no such thing as a perfect program analysis, and any real program analysis produces the wrong result for some nonempty set of inputs.
- 12
This is an example of a mapping reduction, where we transform an input \(a\) to an input \(b\) such that \(a \in L_1 \Longleftrightarrow b \in L_2\). In this case, we have \((\inner{M}, x) \in \htm \Longleftrightarrow (\inner{M'}, x) \in \lang{WriteOnes}\). We will see more details about mapping reductions later.