Complexity
Introduction to Complexity
In the previous unit, we examined what problems are solvable without any constraints on the time and space resources used by an algorithm. We saw that even with unlimited resources, most problems are not solvable by an algorithm. In this unit, we focus on those problems that are actually solvable and consider how efficiently they can be solved. We start with decision problems (languages) and defer consideration of functional (i.e. search) problems until later.
For a Turing machine, its time complexity is the number of steps until it halts, and its space complexity is the number of tape cells used before halting. We focus primarily on time complexity, and as mentioned previously, we are interested in the asymptotic complexity with respect to the input size, ignoring leading constants, over worst-case inputs. For a particular asymptotic bound \(\O(t(n))\), we can define a class of languages that are decidable by a Turing machine with time complexity \(\O(t(n))\), where \(n\) is the size of the input:
\(\DTIME(t(n))\) is the class of all languages decidable by a Turing machine with time complexity \(\O(t(n))\).
\(\DTIME(t(n))\) is a complexity class, as it consists of languages whose time complexity is within a particular bound. Concrete examples include \(\DTIME(n)\), the class of languages decidable by a Turing machine in linear time, and \(\DTIME(n^2)\), the class of languages decidable in quadratic time by a Turing machine.
When we discussed computability, we defined two classes of languages, the decidable languages (also called recursive and denoted by \(\RD\)) and the recognizable languages (also called recursively enumerable and denoted by \(\RE\)). The definitions of these two classes are actually model-independent – recall that the Church-Turing thesis states that any language that is decidable in a different model is also decidable in the Turing-machine model. Similarly, any language decidable on a Turing machine is also decidable in another Turing-complete model.
Unfortunately, this model-independence does not extend to \(\DTIME(t(n))\). Consider the concrete language [1]:
In a programming model with random access, or even an extended Turing-machine model with two tapes, \(\textrm{PALINDROME}\) can be decided in \(\O(n)\) time, where \(n = \abs{(x, y)}\) – walk pointers inward from both ends of the input, comparing character-by-character. In the standard Turing-machine model where all computation is local, an algorithm to decide this language would need to examine the first character of \(x\), move the head sequentially to the last character of \(y\), then move the head back to examine the second character of \(x\), and so on. Each pair of characters takes \(\O(n)\) steps just to move the head, for a total running time of \(\O(n^2)\). Thus, \(\textrm{PALINDROME} \in \DTIME(n^2)\) and \(\textrm{PALINDROME} \notin \DTIME(n)\), but in other models, it requires only \(\O(n)\) time to decide.
A model-dependent complexity class is very inconvenient for analysis – we wish to analyze algorithms in pseudocode without worrying about details of how an algorithm would be implemented on a Turing machine. Furthermore, we would like to ignore other implementation details such as choice of data structure, which can affect the asymptotic running time.
As a step towards establishing a model-independent complexity class, we note that there is an enormous difference between the growth of polynomial and exponential functions. The following illustrates how several polynomials compare to the growth of \(2^n\):
The vertical axis in this plot is logarithmic, so that the growth of \(2^n\) appears as a straight line. Observe that even a polynomial such as \(n^{10}\) with a relatively large exponent grows far slower than the exponential \(2^n\), with the latter exceeding the former for all \(n \ge 60\).
In addition to the dramatic difference in growth between polynomials and exponentials, we observe that the set of polynomials is closed under composition – if \(f(n) = \O(n^k)\) and \(g(n) = \O(n^m)\), then \(f(g(n)) = \O(n^{km})\), which is also polynomial. This means that if we compose a polynomial-time algorithm with another polynomial-time algorithm, the resulting algorithm is also polynomial in time.
As a final remark, the extended Church-Turing thesis states that polynomial time translates between models of computation:
An algorithm that runs in polynomial time in one deterministic computational model also runs in polynomial time in any other realistic, deterministic model.
We thus establish a model-independent complexity class of languages decidable by a polynomial-time algorithm.
\(\P\) is the class of languages decidable by a polynomial-time Turing machine, where
Given the significant difference between polynomial and exponential time, we informally consider \(\P\) to be the class of efficiently decidable languages. The class \(\P\) includes fundamental problems, such as the decision versions of integer arithmetic and sorting. (Since \(\P\) is a class of decision problems, it does not include the functional variants of these problems.)
Efficiently Verifiable Languages
For some languages, the task of verifying that an input is in the language, given some extra information about the input, may be easier than deciding whether it is in the language from scratch. Consider the following maze, courtesy of Kees Meijer’s maze generator:
At first glance, it is not clear that there is a path from start to finish. However, if someone were to give us an actual path, it is straightforward to follow that path to determine whether it takes us from entrance to exit:
On the other hand, for a maze that has no path from start to finish, no matter what path someone gives us, we would be unable to verify that it takes us from entrance to exit:
As another example, the traveling salesperson problem (TSP) is the task of finding the minimal-weight tour that visits all the vertices of a weighted, complete (i.e. fully connected) graph, starting and ending at the same vertex. Suppose we wish to determine the minimal tour of the following graph:
If someone were to tell us that the path \(A \to B \to D \to C \to A\) is the minimal tour, how could we verify that it is indeed minimal? There doesn’t seem to be an obvious way to do so without considering all other tours and checking whether any of them produce a tour of lower weight. (This particular graph only has three distinct tours, but in general, the number of distinct tours is exponential in the number of vertices in a graph.) On the other hand, if we modify the problem so that we are looking for a tour that comes in under a specific budget, verifying that such a tour exists becomes straightforward. This limited-budget version of TSP is as follows:
Given a graph \(G = (V, E)\) with weight function \(w : E \to \R\), is there a tour of weight at most \(k\) that visits all vertices?
For the graph above with \(k = 60\), the tour \(A \to B \to D \to C \to A\) gives us a total weight of 61, so it does not meet the budget. On the other hand, the tour \(A \to B \to C \to D \to A\) gives us a total weight of 58 and does come under the budget. Verifying that a tour meets the budget is as simple as adding up the weights of the edges along the tour. Thus, if a given graph does have a tour that meets the budget, we can verify this is the case if we are given the actual tour that is under budget. On the other hand, if a graph has no tour that meets the budget, no matter what tour we are given, we will never erroneously conclude that the graph has a tour that comes under budget.
Generalizing the reasoning above, we say that a decision problem \(L\) is efficiently verifiable when:
If an input \(x\) is in \(L\), then there is an efficient way to verify that it is given some additional information, which we call a certificate.
If an input \(x\) is not in \(L\), then there is no additional information, even maliciously produced, that could convince us that \(x\) is in \(L\).
As one more example, consider the language of polynomials that have positive roots:
Is \(x^3 - x - 60\) in \(\mathrm{POLY}\)? Given \(x = 4\) as the certificate, we can indeed verify that \(4^3 - 4 - 60 = 64 - 4 - 60 = 0\), so \(x^3 - x - 60 \in \mathrm{POLY}\). On the other hand, for the polynomial \(x^2 + 1\), no certificate can convince us that this polynomial has a positive root. Since we can compute the value of a polynomial efficiently on a given \(x\), \(\mathrm{POLY}\) is efficiently verifiable.
More formally, a language \(L\) is efficiently verifiable if there exists an algorithm \(V\), called a verifier, that takes two inputs \(x,c\) such that:
The computation \(V(x, c)\) takes polynomial time with respect to \(\abs{x}\), the size of \(x\).
If \(x \in L\), then there exists some certificate \(c\) such that \(V(x, c)\) accepts.
If \(x \notin L\), then \(V(x, c)\) rejects for all certificates \(c\).
This gives rise to the class of languages that are efficiently verifiable.
\(\NP\) is the class of efficiently verifiable languages [2].
Note that \(\NP\) does not stand for “not polynomial” – rather, it is short for “nondeterministic polynomial,” based on an equivalent definition of the class using the computational model of nondeterministic Turing machines. This model is beyond the scope of this text.
Let us return to our first example, that of determining whether a maze has a path from start to finish. A maze can be represented as an undirected graph, with a vertex for each position in the maze and edges between adjacent positions. Then the problem is to determine whether there exists a path between the start vertex \(s\) and the end vertex \(t\). This gives us the language:
We can define a verifier for \(\MAZE\) as follows. The certificate is a sequence of vertices, representing a path.
We analyze this verifier as follows. First, the verifier runs in polynomial time with respect to \(\abs{G}\) – it examines at most \(\abs{V}\) pairs of adjacent vertices, and each pair of vertices can be examined in polynomial time (the exact time depends on how the graph \(G\) is represented). As for correctness, we have:
If \((G,s,t) \in \MAZE\), there exists some path \((s,v_2,\dots, v_{m-1},t)\) between \(s\) and \(t\) in \(G\). Then \(V((G, s, t), (s, v_2, \dots, v_{m-1},t))\) accepts.
If \((G,s,t) \notin \MAZE\), then there is no valid path from \(s\) to \(t\) in \(G\). Any certificate \(c\) will either not start at \(s\) and end at \(t\), or it will have some pair of adjacent vertices \(v_i,v_{i+1}\) such that no edge connects the two vertices. Thus, \(V((G, s, t), c)\) will reject.
Thus, \(V\) is a correct and efficient verifier for \(\MAZE\), so \(\MAZE \in \NP\).
Returning to the limited-budget TSP example, we define the corresponding language:
We take as our certificate the actual tour and define a verifier for \(\TSP\) as follows:
This verifier runs in time polynomial with respect to \(\abs{G}\) – finding duplicates can be done in polynomial time, e.g. by sorting the vertices, and the verifier additionally examines at most \(\abs{V}\) edges. Then if \((G, k) \in \TSP\), the actual tour of weight at most \(k\) is a valid certificate. Given this certificate, \(V\) verifies that it actually is a tour and has weight no more than \(k\), so \(V\) accepts. On the other hand, if \((G, k) \notin \TSP\), then \(V\) rejects all certificates – either the given certificate is not a valid path (e.g. it visits a nonexistent vertex), or it doesn’t start and end at the same vertex, or it doesn’t visit all the vertices, or its total weight is more than \(k\). Thus, \(V\) correctly and efficiently verifies \(\TSP\), and \(\TSP \in \NP\).
P Versus NP
We have now defined two distinct classes of languages:
\(\P\) is the class of languages that can be decided efficiently
\(\NP\) is the class of languages that can be verified efficiently
More formally, for a language \(L\), we have \(L \in \P\) if there exists a polynomial-time algorithm \(D\) such that:
if \(x \in L\), then \(D\) accepts \(x\)
if \(x \notin L\), then \(D\) rejects \(x\)
Similarly, we have \(L \in \NP\) if there exists a polynomial-time algorithm \(V\) such that:
if \(x \in L\), then \(V(x, c)\) accepts for at least one certificate \(c\)
if \(x \notin L\), then \(V(x, c)\) rejects for all certificates \(c\)
How are these two classes related? First, if a language is efficiently decidable, it is trivially efficiently verifiable – a verifier can just ignore the certificate and determine independently whether the input is in the language or not, and it can do so efficiently. Thus, we have
This relationship allows two possibilities:
\(\P \subset \NP\) or
\(\P = \NP\)
The latter possibility would mean that every efficiently verifiable problem is also efficiently decidable. Is that the case? What is the answer to the question
We don’t actually know! This is perhaps the greatest unsolved problem in all of Computer Science.
Consider two of the languages we just saw, \(\MAZE\) and \(\TSP\). We saw that both are in \(\NP\), and in fact, the languages have similar definitions and their verifiers have much in common. However, we know that \(\MAZE \in \P\) – a decider can perform a breadth-first search from \(s\) to see whether it reaches \(t\), and this search can be done efficiently. On the other hand, we do not know whether or not \(\TSP\) is in \(\P\) – we know of no algorithm that efficiently decides \(\TSP\), but we also do not know that such an algorithm does not exist.
How can we resolve the \(\P \stackrel{?}{=} \NP\) question? If the two are not actually equal, one way we could show this is to find a language in \(\NP\) and show that it is not in \(\P\). However, this seems exceedingly difficult – we would need to somehow demonstrate that of all the infinitely many possible algorithms, none of them decide the language in question in polynomial time. On the other hand, showing that \(\P = \NP\) also seems difficult – we would need to demonstrate that for any language in \(\NP\), it is possible to write an efficient decider for that language.
How can we make the task simpler? Suppose we identify a “hard” language \(L\), such that a solution to \(L\) would lead to a solution to all languages in \(\NP\). Then showing \(\P = \NP\) would only require proving that \(L \in \P\). Similarly, if \(L\) is a “hard” language and is also in \(\NP\), it is likely to be easier to prove that \(L \notin P\) than it would be for an “easier” language in \(\NP\).
Informally, we say that a language \(L\) is NP-Hard if a polynomial-time algorithm for \(L\) can be converted to yield a polynomial-time algorithm for all languages in \(\NP\). We will formalize the definition of an \(\NP\)-Hard language later.
Do \(\NP\)-Hard languages actually exist? Indeed they do, and we will discuss one such example next.
The Cook-Levin Theorem
Our first example of an \(\NP\)-Hard problem is Boolean satisfiability. Given a Boolean formula such as
we wish to determine whether there is an assignment of its variables such that the formula is true. A formula is comprised of several components:
variables such as \(x\) and \(y\), which can either be true (often represented as 1) or false (represented as 0)
literals, which are either a variable or its negation (e.g. \(x\) or \(\neg x\)); each appearance is considered to be a distinct literal
operators, which may either be conjunction (and), represented as \(\wedge\), or disjunction (or), represented as \(\vee\)
The size of a formula is the number of literals it contains; the formula \(\phi\) above has size 6.
An assignment is a mapping of the variables in a formula to truth values. We can represent an assignment over \(n\) variables as an \(n\)-tuple, such as \((a_1, \dots, a_n)\). For the formula above, the assignment \((0, 1, 0)\) maps \(x\) to the value 0, \(y\) to the value 1, and \(z\) to the value 0. We use the notation \(\phi(a_1, \dots, a_n)\) to denote the value of \(\phi\) given the assignment \((a_1, \dots, a_n)\).
A satisfying assignment is an assignment that causes the formula as a whole to be true. In the example above, \((0, 1, 0)\) is a satisfying assignment:
On the other hand, \((1, 0, 1)\) is not a satisfying assignment:
A formula \(\phi\) is satisfiable if it has a satisfying assignment. The language corresponding to whether a formula is satisfiable is just the set of satisfiable formulas:
This language is decidable. A formula \(\phi\) of size \(n\) has at most \(n\) variables, so there are at most \(2^n\) possible assignments. A decider can just iterate over each of these assignments, compute the value of the formula on each assignment, accept if an assignment results in a true value, and reject if none of the assignments results in a true value.
\(\SAT\) can also be efficiently verified. The certificate is just an assignment, and the verifier need only compute the value of the formula for the given assignment. This can be done in linear time. By definition, a formula \(\phi\) is satisfiable if and only if it has a satisfying assignment. Then if \(\phi\) is satisfiable, there is a certificate (a satisfying assignment) that causes the verifier to accept, and if \(\phi\) is not satisfiable, no certificate results in the verifier accepting. Thus, \(\SAT\) is efficiently verifiable, and \(\SAT \in \NP\).
Is \(\SAT\) efficiently decidable? The decider we discussed above is not efficient – it takes time exponential in the number of variables in the formula, and the size of the formula (i.e. the number of literals) may be the same as the number of variables. We actually do not know whether \(\SAT\) is efficiently decidable – the question of whether \(\SAT \stackrel{?}{\in} \P\) is unresolved. However, the Cook-Levin theorem states that \(\SAT\) is \(\NP\)-Hard; thus, if \(\SAT\) is actually in \(\P\), then \(\P = \NP\).
\(\SAT\) is \(\NP\)-Hard.
The key idea in the proof of this theorem is that for an arbitrary language \(L \in \NP\) and input \(x\), we can turn an efficient verifier \(V\) for \(L\) and the input \(x\) into a Boolean formula \(\phi_{V,x}\) such that:
if \(x \in L\), then \(\phi_{V,x} \in \SAT\)
if \(x \notin L\), then \(\phi_{V,x} \notin \SAT\)
The size of \(\phi_{V,x}\) will be polynomial in the size of \(x\). Then if \(\SAT\) has an efficient decider \(D_\SAT\), we can decide \(L\) as follows:
Since \(\phi_{V,X}\) can be constructed in polynomial time with respect to the size of \(x\) and \(D_\SAT\) runs in time polynomial with respect to the size of \(\phi_{V,x}\), \(D_L\) as a whole takes polynomial time. Thus, \(D_L\) is an efficient decider for \(L\). So if an efficient decider \(D_\SAT\) for \(\SAT\) exists, then \(L \in \P\). Since this is true for an arbitrary language \(L \in \NP\), we have that if \(\SAT \in \P\), then \(\P = \NP\).
Before we proceed to the construction of \(\phi_{V,x}\), we review the string representation of the configuration of a Turing machine, which we previously discussed in Wang Tiling. A configuration encodes the contents of the machine’s tape, the state \(q \in Q\) in which the machine is, and the position of the head. We represent these details with an infinite string over the alphabet \(\Gamma \cup Q\), where the string contains:
a symbol for the contents of each cell, in order starting from the leftmost cell
the state \(q \in Q\), positioned immediately to the left of the symbol corresponding to the cell on which the head is located
Then if the input is \(x_1 x_2 \dots x_n\) and the initial state is \(q_0\), the following encodes the starting configuration of the machine:
Since the head is at the leftmost cell and the state is \(q_0\), the string has \(q_0\) to the left of the leftmost tape symbol. If the transition function gives \(\delta(q_0, x_1) = (q', x', R)\), the new configuration is:
The first cell has been modified to \(x'\), the machine is in state \(q'\), and the head is over the second cell, represented here by writing \(q'\) to the left of that cell’s symbol.
For a language \(L \in \NP\), there exists an efficient verifier \(V(x, c)\) that runs in time polynomial with respect to the size of \(x\). In other words, there is some fixed \(k\) such that \(V(x, c)\) makes at most \(\abs{x}^k\) steps before halting. Since the head starts on the leftmost cell and can only move one position in each step, this implies that \(V(x, c)\) can only read or write the first \(\abs{x}^k\) cells on the tape. Thus, we only need a string of size \(\abs{x}^k\) to encode the configuration of \(V\) as opposed to an infinite string.
For an input of size \(n\), we can represent the configurations of the machine over the (at most) \(n^k\) steps with an \(n^k \times n^k\) tableau, with one configuration per row. In each row, we place the symbol \(\#\) at the start and end [3].
Technically, this means we need \(n^k + 3\) columns to represent a configuration: \(n^k\) tape cells, plus the state \(q\) and the two \(\#\) symbols. However, this technicality is not important to the proof, so we ignore it for the rest of our discussion.
Each cell of the tableau contains a symbol out of the set \(S = \Gamma \cup Q \cup \{\#\}\), where \(\Gamma\) is the tape alphabet of \(V\) and \(Q\) is the set of states in \(V\). For each cell \(i,j\) and each symbol \(s \in S\), we define a Boolean variable \(t_{i,j,s}\) such that:
This gives us a total of \(\abs{S} \cdot n^{2k}\) variables, which is polynomial in \(n\); \(S\) does not depend on \(n\) in any way, so its size is constant with respect to \(n\).
Given an input \(x\), we construct a formula \(\phi_{V,x}\) over the \(\abs{S} \cdot n^{2k}\) variables \(t_{i,j,s}\), where \(n = \abs{x}\), such that:
if \(V(x, c)\) accepts for some certificate \(c\), then \(\phi_{V,x} \in SAT\)
if \(V(x, c)\) rejects for all certificates \(c\), then \(\phi_{V,x} \notin SAT\)
The formula \(\phi_{V,x}\) will be defined as the conjunction of several subformulas:
The purpose of these subformulas is as follows:
\(\phi_{V,x,start}\) enforces the starting configuration in the first row of the tableau
\(\phi_{V,x,cell}\) ensures that each cell contains exactly one symbol
\(\phi_{V,x,accept}\) ensures that \(V\) reaches an accepting configuration
\(\phi_{V,x,move}\) enforces that each configuration follows from the previous configuration according to the transition function of \(V\)
Starting Configuration
For the starting configuration, we assume an encoding of the input \((x, c)\) such that the symbol \(\$\) separates \(x\) from \(c\). Let \(m\) be the size of the certificate. Then the starting configuration is as follows:
The configuration starts with the \(\#\) symbol. The head is over the leftmost tape cell and the machine is in state \(\qst\), so \(\qst\) is in the second position of the configuration. The symbols of the input \(x\) follow, then the \(\$\) separator and the symbols of the certificate \(c\). The remaining cells are blank, except for the last which contains the symbol \(\#\).
For each of the input symbols \(x_j\), we require the tableau cell at position \(1,j+2\) to contain the symbol \(x_j\). The corresponding variable \(t_{1,j+2,x_j}\) must be true. Thus, we insure the correct input with:
For the cells corresponding to the certificate, we do not actually know in advance what certificate causes \(V\) to accept, or even what the size \(m\) is (other than that it is less than \(n^k\)). As a result, we allow any symbol \(\gamma \in \Gamma\) to appear in the positions [4] after the input \(x\). Then the portion of the formula corresponding to the certificate is:
Putting these pieces together with the rest of the starting configuration, we have:
This formula contains one literal for each column of the input, the state, and the \(\#\) and \(\$\) symbols. It contains \(\abs{\Gamma}\) literals for each possible component of the certificate. Observe that \(\abs{\Gamma}\) is a constant that depends on the verifier \(V\) but not on the input \(x\). Thus, there are a constant number of literals for each element of the starting configuration, for a total of \(\O(n^k)\) literals.
Technically, we need to ensure that blanks only appear at the end of the certificate. We can do so by modifying \(\phi_{certificate}\) such that \(t_{1,i,\gamma}\) is replaced with \((t_{1,i,\gamma} \wedge \neg t_{1,i-1,\bot})\) for \(\gamma \ne \bot\). (In other words, if a non-blank symbol appears at a position in the certificate, then the blank symbol cannot be in the position immediately to its left.) This only increases the size of the subformula by a constant factor.
Cell Consistency
To ensure that each cell \(i,j\) contains exactly one symbol, we need at least one of the variables \(t_{i,j,s}\) to be true:
We also cannot have more than one such variable be true – otherwise the cell would contain more than one symbol. For any pair of distinct symbols \(s \ne u\), we require that at least one of \(t_{i,j,s}\) or \(t_{i,j,u}\) be false:
Putting these together for all cells \(i,j\), we get:
The formula contains \(\O(\abs{S}^2)\) literals for each cell. Since \(S\) does not depend on the input size, there are a constant number of literals per cell, and \(\phi_{V,x,cell}\) has \(\O(n^{2k})\) literals in total.
Accepting Configuration
We ensure that an accepting configuration is reached by requiring at least one cell to have \(\qacc\) as its symbol:
This has one literal per cell, for a total of \(\O(n^{2k})\) literals.
Transitions
To ensure that each row follows from the previous one according to the transition function of \(V\), we divide the tableau into 2-by-3 windows and ensure that each window is valid:
There are \(\abs{S}^6\) distinct possibilities for a 2-by-3 window. However, not all windows are valid according to the behavior of \(V\) and the construction of the tableau. We describe the conditions for valid windows below. We use the following notation in the descriptions and diagrams:
\(\gamma\) for an element of \(\Gamma\)
\(q\) for an element of \(Q\)
\(\rho\) for an element of \(\Gamma \cup Q\)
\(\sigma\) for an element of \(\Gamma \cup \{\#\}\).
For a window to be valid with respect to the \(\#\) symbol, one of the following must hold:
the window does not contain any \(\#\) symbols at all
the window contains the \(\#\) symbol in the first position of both rows
the window contains the \(\#\) symbol in the third position of both rows
The following are valid windows with respect to \(\#\):
The remaining symbols \(\rho_1, \dots, \rho_6\) must be valid according to the rules below.
If a window is not near a state symbol \(q \in Q\), then a transition does not affect the portion of the configuration corresponding to the window, so the top and bottom row match:
To reason about what happens to a window that is near a state symbol, we take a look at what happens to the configuration as a result of a rightward transition \(\delta(q, \gamma) = (q', \gamma', R)\):
The head moves to the right, so \(q'\) appears in the bottom row at the column to the right of where \(q\) appears in the top row. The symbol \(\gamma\) is replaced by \(\gamma'\), though its position moves to the left to compensate for the rightward movement of the state. There are four windows affected by the transition, labeled above in the leftmost column of the window. We thus have the following four valid 2-by-3 windows corresponding to a rightward transition \(\delta(q, \gamma) = (q', \gamma', R)\):
We have used \(\sigma\) for the edges of the window, as it can actually either be a tape symbol or the tableau-edge marker \(\#\).
We now take a look at what happens to the configuration as a result of a leftward transition \(\delta(q, \gamma) = (q', \gamma', L)\):
The head moves to the left, so \(q'\) appears in the bottom row at the column to the left of where \(q\) appears in the top row. As with a rightward transition, we replace the old symbol \(\gamma\) with the new symbol \(\gamma'\). This time, however, it is the symbol to the left of the original position of the head that moves to compensate for the movement of the state. We now have five windows affected by this transition, giving us the following five valid 2-by-3 windows for a leftward transition \(\delta(q, \gamma) = (q', \gamma', L)\):
We also need to account for the case where the head is over the leftmost cell on the tape, in which case the head does not move:
Here, we have three windows affected, but the last actually has the same structure as the last window in the normal case for a leftward transition. Thus, we only need two additional valid windows:
We need one last set of valid windows to account for the machine reaching the accept state before the last row. We allow the accepting configuration to repeat indefinitely as needed by including windows with \(\qacc\) at the same position in both rows:
For each valid window \(w\) and each starting position \(1 \le i \le n^k-1, 1 \le j \le n^k-2\), we define a clause \(\phi_{i,j,w}\) corresponding to that window. Let \(s_1, \dots, s_6\) be the six symbols in \(w\). The clause is then:
For each position, we only need one such clause to be satisfied over the set of valid windows. Let \(W\) be the set of all valid windows. Then we need:
Finally, we need every window in the tableau to be valid, so we have:
The set of valid windows \(W\) does not depend on the input size \(n\), so \(\phi_{i,j,move}\) has a constant number of literals. Then \(\phi_{V,x,move}\) has a different copy of \(\phi_{i,j,move}\) for each cell window that starts at \(i,j\). There are \(\O(n^{2k})\) windows, so \(\phi_{i,j,move}\) has \(\O(n^{2k})\) literals in total.
Conclusion
We have demonstrated how to construct a formula \(\phi_{V,x}\) from an efficient verifier \(V\) for a language \(L \in \NP\) and an input \(x\). The total size of \(\phi_{V,x}\) is \(\O(n^{2k})\) literals, where \(n = \abs{x}\). This is polynomial in the size of \(x\), and the construction can be done in polynomial time.
For the formula to be satisfied, \(\phi_{V,x,accept}\) must be satisfied, and it ensures that an accepting configuration is reached somewhere in the tableau. The remaining pieces of \(\phi_{V,x}\) ensure that the computation leading to that accepting configuration is valid. If \(\alpha\) is a satisfying assignment for \(\phi_{V,x}\), the certificate \(c\) that causes \(V(x, c)\) to accept is comprised of the variables \(t_{1,n+4,\gamma}, \dots t_{1,n^k-1,\gamma}\) that are true in \(\alpha\) (ignoring the trailing \(\bot\) symbols). A satisfying assignment exists if and only if a certificate \(c\) exists such that \(V(x, c)\) accepts.
Thus, if we have an efficient decider \(D_\SAT\) for \(\SAT\), we can construct \(\phi_{V,x}\) and run \(D_\SAT\) on it to decide \(L\). As a result, an efficient decider for \(\SAT\) implies that \(\P = \NP\), and \(\SAT\) is an \(\NP\)-Hard problem.
A problem that is equivalent to formula satisfiability is that of circuit satisfiability. Given a circuit \(C\) composed of \(m\) logic gates over \(n\) binary inputs and with a single binary output, is there a set of input values such that the output is one? In other words, is there an assignment of the inputs such that the circuit is satisfied?
Without loss of generality, assume that the logic gates are each either a unary NOT gate, a binary AND gate, or a binary OR gate. (This is a universal gate set, so any logic circuit can be expressed in terms of these gates.) Since these gates correspond to negation, conjunction, and disjunction, we can mechanically convert a Boolean formula into an equivalent circuit whose size is linear with respect to the size of the formula. The following is a circuit corresponding to the formula \((y \vee (\neg x \wedge z) \vee \neg z) \wedge (\neg x \vee z)\):
What about converting a circuit into a formula? In general, a circuit may have multiple fan-out, meaning that the output of one gate may be used as multiple inputs to subsequent gates. The following is an example:
A naïve translation can result in a combinatorial increase in size – where multiple fan-out occurs, the subformula corresponding to the intermediate result would be repeated. For the circuit above, such a translation would be
The subformulas \(x \wedge y\) and \(x \vee y\) are repeated twice. For larger circuits, a subformula may be repeated many times.
Rather than naïvely repeating, we can introduce new variables corresponding to the output of each gate. For instance, the following is an AND gate with existing variables \(a\) and \(b\) as input and a new variable \(c\) as output:
The relationship between the inputs \(a\) and \(b\) and the output \(c\) is \(c = (a \wedge b)\). A formula cannot directly express equality, but observe that variables \(x\) and \(y\) are equal exactly when either both of them are true, or both are false. Thus, \(x = y\) is equivalent to \((x \wedge y) \vee (\neg x \wedge \neg y)\). For \(c = (a \wedge b)\), we have
In the second step, we applied De Morgan’s laws to move negation inwards, so that only variables and not subformulas are negated.
We can similarly express an OR gate with inputs \(a\) and \(b\) and output \(c\):
For a NOT gate with input \(a\), we can simply express the output as \(\neg a\) without introducing a new variable.
Now that we have subformulas for each gate, the formula for the full circuit is just the conjunction of the subformulas for each gate. Since each gate introduces at most one variable and a subformula of at most six literals, the resulting formula is linear in size with respect to the size of the circuit.
We have demonstrated that formula satisfiability is equivalent to circuit satisfiability. Since the former \(\NP\)-Hard, so is the latter. We will shortly formalize the notion of converting between problems with polynomial-time mapping reductions.
NP-Completeness
The Cook-Levin theorem gives us a process for converting an instance of an arbitrary \(\NP\) language \(L\) to an instance of \(\SAT\), such that:
if \(x \in L\), then \(\phi_{V,x} \in \SAT\)
if \(x \notin L\), then \(\phi_{V,x} \notin \SAT\)
The process for doing so is efficient, taking time polynomial in the size of \(x\). Thus, if we have an efficient decider for \(\SAT\), we can also efficiently decide \(L\) by performing the Cook-Levin conversion and running the \(\SAT\) decider on the resulting formula.
This conversion of an instance of one problem to an instance of another is called a mapping reduction. Formally, a mapping reduction is defined as follows:
A mapping reduction from language \(A\) to language \(B\) is a computable function \(f : \Sigma^* \to \Sigma^*\) such that:
\(x \in A \implies f(x) \in B\)
\(x \notin A \implies f(x) \notin B\)
A mapping reduction \(f\) may or may not be computable efficiently – it is only efficient if \(f\) is polynomial-time computable.
A function \(f\) is polynomial-time computable if \(f(x)\) can computed in time polynomial with respect to \(\abs{x}\), the size of \(x\).
A polynomial-time mapping reduction, also called a polynomial-time reduction or just a polytime reduction, is a mapping reduction \(f\) that is polynomial-time computable.
A language \(A\) is polynomial-time reducible, or polytime reducible, to a language \(B\) if there is a polynomial-time reduction from \(A\) to \(B\). We use the notation
to indicate that \(A\) is polytime reducible to \(B\).
Observe that there are several differences between Turing reducibility and polynomial-time reducibility.
A Turing reduction \(A \Tr B\) requires demonstrating that a decider for \(A\) can be written using a black-box decider for \(B\). A polynomial-time mapping reduction \(A \pmr B\) does not involve a black box. Instead, it requires demonstrating a computable function that converts instances of \(A\) to instances of \(B\) and non-instances of \(A\) to non-instances of \(B\).
There are no efficiency requirements on a Turing reduction – the black box may be invoked any number of times, and the decider that uses the black box may take arbitrary (but finite) time. For a polynomial-time mapping reduction, the conversion function \(f(x)\) must be computable in time polynomial in \(\abs{x}\).
We saw previously that \(A \Tr B\) and \(B\) decidable implies that \(A\) is decidable [5]. A polytime reduction \(A \pmr B\) gives us a similar result with respect to membership in \(\P\).
Denoting the class of decidable languages by \(\RD\), we have:
If \(A \Tr B\) and \(B \in \RD\), then \(A \in \RD\).
If \(A \pmr B\) and \(B \in \P\), then \(A \in \P\).
Since \(B \in \P\), there exists a polynomial-time decider \(D_B\) such that:
if \(y \in B\), then \(D_B\) accepts \(y\) in time \(\O(\abs{y}^k)\) for some constant \(k \ge 1\)
if \(y \notin B\), then \(D_B\) rejects \(y\) in time \(\O(\abs{y}^k)\)
Additionally, since \(A \pmr B\), there exists a function \(f\) such that:
if \(x \in A\), then \(f(x) \in B\)
if \(x \notin A\), then \(f(x) \notin B\)
Furthermore, \(f\) is computable in polynomial time, so there exists a Turing machine \(C\) such that:
given \(x \in A\), then \(C\) outputs \(f(x) \in B\) in \(\O(\abs{x}^m)\) time for some constant \(m \ge 1\)
given \(x \notin A\), then \(C\) outputs \(f(x) \notin B\) in \(\O(\abs{x}^m)\) time
We can use \(C\) and \(D_B\) to construct a polynomial-time decider for \(A\):
We analyze this decider as follows:
If \(x \in A\), then \(y \in B\) since \(C\) computes \(f(x)\), and \(x \in A \implies f(x) \in B\). Then \(D_B\) accepts \(y\), so \(D_A\) accepts \(x\).
If \(x \notin A\), then \(y \notin B\) since \(x \notin A \implies f(x) \notin B\). Then \(D_B\) rejects \(y\), so \(D_A\) rejects \(x\).
Thus, \(D_A\) is a decider for \(A\). Furthermore, \(D_A\) runs in time polynomial with respect to \(\abs{x}\). The invocation of \(C\) on \(x\) takes time \(\O(\abs{x}^m)\), and the size of its output \(y\) can be at most \(\O(\abs{x}^m)\) – a Turing machine can write at most one symbol of the output in each step. Then the invocation of \(D_B\) on \(y\) takes time in \(\O(\abs{y}^k) = \O(\abs{x}^{mk})\). The total time for \(D_A\) is then \(\O(\abs{x}^m + \abs{x}^{mk}) = \O(\abs{x}^{mk})\), which is polynomial in \(\abs{x}\) since \(mk\) is a constant. Thus, \(D_A\) is an efficient decider for \(A\).
Since \(A\) has an efficient decider, we conclude that \(A \in \P\).
The Cook-Levin theorem demonstrates that \(L \pmr \SAT\) for any language \(L \in \NP\). Thus, if \(\SAT \in \P\), all languages in \(\NP\) are in \(\P\), so \(\P = \NP\). We stated previously that informally, a language \(A\) is \(\NP\)-Hard if \(A \in \P\) implies that \(\P = \NP\). We now formalize the definition of \(\NP\)-Hard.
A language \(A\) is \(\NP\)-Hard if for every language \(L \in \NP\), \(L \pmr A\).
A language need not be in \(\NP\) or even decidable to be \(\NP\)-Hard. For instance, \(\atm\) is an \(\NP\)-Hard language; we leave the proof as an exercise.
If a language \(A\) is both in \(\NP\) and also \(\NP\)-Hard, then we say that \(A\) is NP-Complete.
A language \(A\) is \(\NP\)-Complete if \(A \in \NP\) and \(A\) is \(\NP\)-Hard.
Since \(\SAT \in \NP\) and \(\SAT\) is \(\NP\)-Hard, \(\SAT\) is an \(\NP\)-Complete language.
The following demonstrates the relationships between \(\P\), \(\NP\), and \(\NP\)-Hard and \(\NP\)-Complete languages, under the two possibilities \(\P \subset \NP\) and \(\P = \NP\):
If \(\P = \NP\), then all languages in \(\NP\) are \(\NP\)-Complete, except the two trivial languages \(\Sigma^*\) and \(\emptyset\).
To show that a language \(A \ne \SAT\) is \(\NP\)-Hard, do we need to repeat the work of the Cook-Levin theorem on \(A\)? Recall that for showing that a language is undecidable, we could either do so directly like we did for \(\atm\), or we could do so via a Turing reduction from a known undecidable language. Similarly, to show that a language is \(\NP\)-Hard, we can either do so directly as we did for \(\SAT\), or we can do so via a polytime reduction from a known \(\NP\)-Hard language.
If \(A \pmr B\) and \(A\) is \(\NP\)-Hard, then \(B\) is \(\NP\)-Hard.
This claim follows from the fact that polytime reductions are transitive. By definition, \(A\) is \(\NP\)-Hard if for every language \(L \in \NP\), \(L \pmr A\). Since we have \(L \pmr A \pmr B\), we have \(L \pmr B\) by transitivity. Thus, for every language \(L \in \NP\), \(L \pmr B\), and \(B\) is \(\NP\)-Hard.
Combining Theorem 98 and Claim 102, the reduction \(A \pmr B\) gives us the following information:
Known Fact |
Conclusion |
\(A \in \P\) |
? |
\(A\) \(\NP\)-Hard |
\(B\) \(\NP\)-Hard |
\(B \in \P\) |
\(A \in \P\) |
\(B\) \(\NP\)-Hard |
? |
Note, however, that \(L \in \P\) and \(L\) being \(\NP\)-Hard are only mutually exclusive if \(\P \ne \NP\), and we do not know if that is the case.
Show that \(\atm\) is \(\NP\)-Hard.
Show that polytime mapping reductions are transitive. That is, if \(A \pmr B\) and \(B \pmr C\), then \(A \pmr C\).
Show that the languages \(\Sigma^*\) and \(\emptyset\) cannot be \(\NP\)-Complete, even if \(\P = \NP\).
3SAT
As a second example of an \(\NP\)-Complete problem, we consider satisfiability of Boolean formulas with a specific structure. First, we introduce a few more terms that apply to Boolean formulas:
A clause is a disjunction of literals, such as \((a \vee b \vee \neg c \vee d)\).
A Boolean formula is in conjunctive normal form (CNF) if it consists of a conjunction of clauses. An example of a CNF formula is:
\[\phi = (a \vee b \vee \neg c \vee d) \wedge (\neg a) \wedge (a \vee \neg b) \wedge (c \vee d)\]A 3CNF formula is a Boolean formula in conjunctive normal form where each clause has exactly three literals, such as the following:
\[\phi = (a \vee b \vee \neg c) \wedge (\neg a \vee \neg a \vee d) \wedge (a \vee \neg b \vee d) \wedge (c \vee d \vee d)\]
We then define the language \(\TSAT\) as the set of satisfiable 3CNF formulas:
Observe that \(\TSAT\) contains strictly fewer formulas than \(\SAT\), i.e. \(\TSAT \subset \SAT\). Despite this, we will demonstrate that \(\TSAT\) is \(\NP\)-Complete.
First, we must show that \(\TSAT \in \NP\). The verifier is the same as for \(\SAT\), except that it also checks whether the formula is in 3CNF:
The value of \(\phi(\alpha)\) can be computed in linear time, and we can also check that \(\phi\) is in 3CNF and that \(\alpha\) is a valid assignment in linear time, so \(V\) is efficient in \(\abs{\phi}\). Furthermore, if \(\phi\) is in 3CNF and is satisfiable, there is some certificate \(\alpha\) such that \(V(\phi, \alpha)\) accepts, but if \(\phi\) is not in 3CNF or is unsatisfiable, \(V(\phi, \alpha)\) rejects for all \(\alpha\). Thus, \(V\) efficiently verifies \(\TSAT\).
We now show that \(\TSAT\) is \(\NP\)-Hard with the reduction \(\SAT \pmr \TSAT\). Given a Boolean formula \(\phi\), we need to efficiently compute a new formula \(f(\phi)\) such that \(\phi \in \SAT \iff f(\phi) \in \TSAT\). We can do so as follows:
First, convert \(\phi\) into a formula \(\phi'\) in conjunctive normal form. We can do so efficiently, as described in detail below.
Second, convert the CNF formula \(\phi'\) into the 3CNF formula \(f(\phi)\). We can do so as follows:
For each clause consisting of fewer than three literals, repeat the first literal to produce a clause with three literals. For example, \((a)\) becomes \((a \vee a \vee a)\), and \((\neg b \vee c)\) becomes \((\neg b \vee \neg b \vee c)\).
For each clause with more than three literals, subdivide it into two clauses by introducing a new dummy variable. In particular, convert the clause
\[(l_1 \vee l_2 \vee \dots \vee l_m)\]into
\[(z \vee l_1 \vee \dots \vee l_{\floor{m/2}}) \wedge (\neg z \vee l_{\floor{m/2} + 1} \vee \dots \vee l_m)\]Observe that if all \(l_i\) are false, then the original clause is not satisfied. The transformed subformula is not satisfied for any value of \(z\), since it would require \(z \wedge \neg z\), which is impossible. On the other hand, if at least one \(l_i\) is true, then the original clause is satisfied. The transformed subformula can be satisfied by choosing \(z\) to be true if \(i > \floor{m/2}\) or \(z\) to be false if \(i \le \floor{m/2}\). Thus, this transformation preserves the satisfiability of the original clause.
We repeat the process on the clauses of the transformed subformula, until all clauses have size exactly three. Each transformation introduces \(\O(1)\) literals, giving us the (approximate) recurrence
\[T(m) = 2T\parens{\frac{m}{2}} + \O(1)\]By the Master Theorem, we conclude that \(T(m) = \O(m)\), so the recursive transformation increases the size of the formula by only a linear amount.
Applying both transformations gives us a new formula \(f(\phi)\) that is polynomial in size with respect to \(\abs{\phi}\), and the transformations can be done in polynomial time. The result \(f(\phi)\) is in 3CNF, and it is satisfiable exactly when \(\phi\) is. Thus, we have \(\phi \in \SAT \iff f(\phi) \in \TSAT\), and \(f\) is computable in polynomial time, so we have \(\SAT \pmr \TSAT\). Since \(\SAT\) is \(\NP\)-Hard, so is \(\TSAT\).
Having shown that \(\TSAT \in \NP\) and \(\TSAT\) is \(\NP\)-Hard, we conclude that \(\TSAT\) is \(\NP\)-Complete.
If a formula \(\phi\) is not in conjunctive normal form, then its structure is as follows:
Here, \(X_{i,k}\) is an arbitrary subformula. We can transform \(\phi\) to the following:
This transformation preserves satisfiability. To see why this is so, we consider two cases for the original formula \(\phi\):
Case 1: There is an assignment \(\alpha\) such that all of \(X_{i,1}, \dots, X_{i,m_i}\) are true in \(\phi(\alpha)\) for some \(i\). Then the original formula \(\phi(\alpha)\) is satisfied by the \(i\)th term in its disjunction. The new formula \(\phi'\) is satisfied by the same assignment for the original variables, but with the addition of \(z_i = 1\) and \(z_{j\ne i} = 0\). Specifically, \(z_i = 1\) causes the first clause to be satisfied, \(z_{j\ne i} = 0\) result in any clause containing an \(X_{j\ne i, k}\) to be satisfied, and \(X_{i,k} = 1\) result in all clauses containing \(X_{i,k}\) to be satisfied. Thus, all clauses in \(\phi'\) are satisfied.
Case 2: For all assignments \(\alpha\), at least one of \(X_{i,1}, \dots, X_{i,m_i}\) is false in \(\phi(\alpha)\) for all \(i\). Then all terms of the disjunction in the original formula are false, so \(\phi(\alpha)\) as a whole is unsatisfied. Consider an assignment \(\alpha'\) to the new formula \(\phi'\) such that \(\alpha\) and \(\alpha'\) assign the same values to the original variables in \(\phi\). Let \(X_{i,k_i}\) be a false term in \(\phi(\alpha)\) for each \(i\). Then the clause \((\neg z_i \vee X_{i,k_i})\) in \(\phi'(\alpha')\) can only be satisfied by setting \(z_i = 0\). However, if all \(z_i = 0\), then the first clause in \(\phi'(\alpha')\) is unsatisfied, so the entire formula is unsatisfied.
We see that if \(\phi\) is satisfiable by an assignment \(\alpha\), then the new formula \(\phi'\) is satisfiable by a modified assignment \(\alpha'\) as described above. However, if \(\phi\) is unsatisfiable, then so is \(\phi'\). Thus, this transformation preserves the satisfiability of the original formula.
The transformation only increases the size of the formula by a linear amount – the new formula \(\phi'\) contains an additional literal \(\neg z_i\) for each subformula \(X_{i,k}\), plus another \(n\) literals for its first clause, where \(n\) is the number of terms in the outer disjunction of \(\phi\). The number of literals in \(\phi'\) is at most triple the number in \(\phi\).
We can repeat this transformation on the subformulas \(X_{i,k}\) to convert them into CNF formulas \(X_{i,k}'\), followed by an application of the distributive law
to turn \((\neg z_i \vee X_{i,k}')\) into CNF. The latter doubles the size of the formula. Overall, we do at most a linear number of transformations, each of which increases the size of the formula by a linear amount, so the total increase is at most quadratic.
We can modify our proof of the Cook-Levin theorem to directly show that \(\TSAT\) is \(\NP\)-Hard. In particular, we demonstrate how to construct \(\phi_{V,x}\) so that it is in conjunctive normal form. We can then apply the transformation discussed above to turn a CNF formula into a 3CNF one.
We first observe that a CNF formula can be constructed recursively from smaller subformulas:
A formula \(\phi = x\) with a single variable is in CNF – it has the single clause \((x)\).
A formula \(\phi = \phi_1 \wedge \phi_2\) is in CNF if \(\phi_1\) and \(\phi_2\) are in CNF – it consists of the combination of the clauses in \(\phi_1\) and \(\phi_2\).
A formula \(\phi = \phi_1 \vee \phi_2\) is only in CNF if \(\phi_1\) and \(\phi_2\) each have a single clause – \(\phi\) is then just a single clause that combines the literals in \(\phi_1\) and \(\phi_2\).
Examining each piece of \(\phi_{V,x}\), we see:
The starting configuration
\[\large \phi_{V,x,start} = t_{1,1,\#} \wedge t_{1,2,\qst} \wedge \phi_{input} \wedge t_{1,n+3,\$} \wedge \phi_{certificate} \wedge t_{1,n^k,\#}\]is in CNF – it consists of \(n^k\) clauses, each with either a single literal (e.g. \((t_{1,1,\#})\)) or \(\abs{\Gamma}\) literals (for each clause in \(\phi_{certificate}\)).
The cell-consistency subformula
\[\large \phi_{V,x,cell} = \bigwedge_{1 \le i,j \le n^k} \brackets{\bigvee_{s \in S} t_{i,j,s} \wedge \bigwedge_{s \ne u \in S} \parens{\neg t_{i,j,s} \vee \neg t_{i,j,u}}}\]is in CNF. The inner conjunction is a conjunction of disjunctions, so it is clearly in CNF. The inner disjunction is a single clause, so it is in CNF. We combine the two with a conjunction, and the result is also in CNF since the two subformulas are in CNF. Finally, the outer conjunction also produces a CNF formula since the individual pieces are in CNF.
The acceptance subformula
\[\large \phi_{V,x,accept} = \bigvee_{1 \le i,j \le n^k} t_{i,j,\qacc}\]is a single clause in CNF.
The transition formula for a window that starts at \(i,j\)
\[\large \phi_{i,j,move} = \bigvee_{w \in W} \phi_{i,j,w}\]is not in CNF. The individual \(\phi_{i,j,w}\) consist of a sequence of conjunctions, so \(\phi_{i,j,move}\) is a disjunction of conjunctions, rather than a conjunction of disjunctions required for CNF. However, we can convert \(\phi_{i,j,move}\) to CNF by applying the distributive law for \(\vee\) and \(\wedge\). In particular, we have
\[\parens{\bigwedge_{i} a_i} \vee \parens{\bigwedge_{j} b_j} = \bigwedge_{i,j} (a_i \vee b_j)\]We can see that this law holds if we consider two cases:
If all \(a_i\) are true, then the left-hand side is true. The right-hand side is also true – each clause \((a_i \vee b_j)\) contains an \(a_i\), so each clause is satisfied, which makes the formula as a whole true.
This same reasoning applies for the case when all \(b_j\) are true.
If at least one \(a_i\) is false and at least one \(b_j\) is false, then the left-hand side is false. The right-hand side is also false, since there is a clause \((a_i \vee b_j)\) where both \(a_i\) and \(b_j\) are false.
How does the size of the right-hand formula compare to the left-hand one? If the two conjunctions on the left contain \(m\) and \(n\) literals, respectively, then the left-hand side has \(m + n\) total literals. The right-hand side has \(m\cdot n\) clauses, each with two literals, for a total size of \(2mn\). When \(m = n\), we go from \(2m\) literals to a size of \(2m^2\).
Suppose we have \(c\) conjunctions on the left-hand side, each with \(m\) literals. Then repeated application of the distributive law takes us from a formula with \(cm\) literals to one with \(cm^c\). For \(\phi_{i,j,move}\), we go from a size of \(6\abs{W}\) in its original form to a size of \(\abs{W}\cdot 6^{\abs{W}}\) in CNF. This is exponential in the number of distinct valid windows \(\abs{W}\), but this number is a characteristic of the verifier \(V\), not the input \(x\). So even though it is a larger constant than before, it is still a constant.
Once we have \(\phi_{i,j,move}\) in CNF, then
\[\large \phi_{V,x,move} = \bigwedge_{1 \le i \le n^k-1, 1 \le j \le n^k-2} \phi_{i,j,move}\]is also in CNF. And while its size is larger than previously, it is only by a constant factor, and its total size is still \(\O(n^{2k})\) literals.
Since \(\phi_{V,x}\) is just a conjunction of the individual pieces above, it is in CNF if those pieces are each in CNF. We can then apply the CNF-to-3CNF transformation to obtain a formula in 3CNF. Thus, if we can decide \(\TSAT\), we can decide an arbitrary language in \(\NP\), and \(\TSAT\) is \(\NP\)-Hard.
Clique
\(\NP\)-Completeness is not exclusive to Boolean satisfiability problems. Rather, there is a wide range of problems across all fields that are \(\NP\)-Complete. As an example, we take a look at the clique problem.
Given an undirected graph \(G = (V, E)\), a clique is a subset of the vertices \(C \subseteq V\) such that there is an edge in \(G\) between every pair of vertices in the clique. Equivalently, the subgraph induced by the clique \(C\), meaning the vertices in \(C\) and all edges between them, is a complete graph.
The following is an example of a clique of size four, with the highlighted vertices in the clique:
We are often interested in finding the maximal clique in a graph – for instance, if a graph represents a group of people and their individual friend relationships, we might want to find the largest set of mutual friends (hence the name “clique”). However, to obtain a decision problem, we introduce a limited budget as we did for the traveling salesperson problem:
Here, we have defined \(\CLIQUE\) with an exact budget \(k\) rather than a lower bound – if a graph \(G = (V, E)\) has a clique \(C\) of size \(m\), then it has one for any size \(0 \le k \le m\) by removing arbitrary vertices from \(C\). In other words, a subset of a clique is always itself a clique.
To show that \(\CLIQUE\) is \(\NP\)-Complete, we start by demonstrating that it is in \(\NP\). A verifier for \(\CLIQUE\) can take in the graph \(G\) and budget \(k\), as well as a set of vertices as the certificate, and check that the vertices in the set indeed comprise a clique of size \(k\). We can define the verifier as follows:
We first analyze the runtime of the verifier. The first step can be done efficiently – checking whether \(c\) is a subset of \(V\) can be done naïvely in quadratic time via nested loops. (Furthermore, by checking that \(k \le \abs{V}\) and that \(\abs{c} = k\), we limit the loop bounds to the size of the input graph.) The loop in the second step does a quadratic number of iterations, each of which can be done efficiently, so the total time is still polynomial.
We now analyze correctness. If \((G, k) \in \CLIQUE\), then there is some clique \(C = \{v_1, \dots, v_k\} \subseteq V\) of size \(0 \le k \le \abs{V}\). If we pass the set \(C\) as the certificate to the verifier, we have \(\abs{C} = k\) and \(C \subseteq V\), so the certificate is not rejected in the first step. Then since \(C\) is a clique, there is an edge between every pair of vertices in \(C\), so the loop in the second step does not reject. Thus, the verifier accepts \((G, k), C\).
On the other hand, if \((G = (V, E), k) \notin \CLIQUE\), then there is no set of \(k\) vertices \(C\) such that \(C\) comprises a clique of \(G\). If \(k\) is not in the range \([0, \abs{V}]\), the verifier rejects immediately. In addition, for any certificate that is not a subset of \(V\) whose size is \(k\), the verifier rejects in the first step. If the verifier gets to the second step, then the certificate \(c\) must be a set of vertices \(c = \{v_1, \dots, v_k\} \subseteq V\). By assumption, \(G\) does not have a clique of size \(k\), so any set of \(k\) vertices from \(G\) must have a missing edge between some pair of vertices within that set. Since the verifier checks for an edge between all pairs of vertices from the given set, it will find such a missing edge and reject. Thus, when \((G, k) \notin \CLIQUE\), the verifier rejects all certificates.
Since the verifier is correct and efficient, we have demonstrated that \(\CLIQUE\) is efficiently verifiable and thus in \(\NP\).
We now show that \(\CLIQUE\) is \(\NP\)-Hard with a polytime reduction from the known \(\NP\)-Hard \(\TSAT\), i.e. \(\TSAT \pmr \CLIQUE\). To do so, we must define a polynomial-time computable function \(f\) that converts a 3CNF formula \(\phi\) into an instance \((G, k)\) of the clique problem. More specifically, we require:
\(\phi \in \TSAT \implies f(\phi) \in \CLIQUE\)
\(\phi \notin \TSAT \implies f(\phi) \notin \CLIQUE\)
Before considering how we can map formulas to graphs, let’s review the definition of \(\TSAT\). It is the set of satisfiable 3CNF formulas, where a formula has \(m\) clauses of three literals each:
Each literal is either a variable itself (e.g. \(v\)) or its negation (e.g. \(\neg v\)).
A satisfying assignment simultaneously satisfies each clause, so that each clause has at least one literal that is true. For a formula with \(m\) clauses to be satisfiable, there must be some set of \(m\) literals, one from each clause, that can simultaneously be true. If no such set exists, the formula is unsatisfiable.
We can now relate 3CNF-satisfiability to the clique problem. Given a formula \(\phi\) of \(m\) clauses, we can construct a graph \(G\) such that:
Each literal \(\mathscr{l}_i \in phi\) has a corresponding vertex \(v_i\) in \(G\).
A pair of vertices \(v_i\) and \(v_j\) have an edge in \(G\) if the corresponding literals \(\mathscr{l}_i\) and \(\mathscr{l}_j\) are in different clauses of \(\phi\), and the two literals can be simultaneously true. This is exactly when the two literals are not negations of each other (e.g. one is \(x\) and the other is \(\neg x\)) – either they are the same (e.g. they are both \(x\) or are both \(\neg y\)), or they refer to different variables (e.g. one is \(x\) and the other is \(\neg y\)).
The following illustrates the graph constructed for two different formulas with three clauses each, with one formula satisfiable and the other unsatisfiable:
Observe that the satisfiable formula on the left does have a clique of size three (one such clique is highlighted), while the unsatisfiable formula on the right does not have a clique of size three.
The actual conversion algorithm is as follows:
We first analyze the efficiency of this algorithm. The first loop takes linear time with respect to the size of the formula (i.e. the number of literals in the formula), and the second loop takes quadratic time. Thus, the entire conversion takes polynomial time with respect to the input size.
We now analyze correctness. First, we show that \(\phi \in \TSAT \implies f(\phi) \in \CLIQUE\). Since \(\phi \in \TSAT\), it is satisfiable, which means there is some satisfying assignment \(\alpha\) such that each clause in \(\phi\) has at least one literal that is true. Let \(S = \{s_1, s_2, \dots, s_m\}\) refer to a set comprised of one true literal from each clause. Then:
Each pair of literals \(s_i, s_j \in S\) can be true simultaneously, since they are all true under the assignment \(\alpha\).
For each \(s_i, s_j \in S\), there is an edge in \(G\) between their corresponding vertices, since the literals are from different clauses and can be simultaneously true.
Since there is an edge in \(G\) between the vertices corresponding to all pairs \(s_i, s_j \in S\), those vertices constitute a clique in \(G\). Since \(\abs{S} = m\), \(G\) has a clique of size \(m\).
We know show that \(\phi \notin \TSAT \implies f(\phi) \notin \CLIQUE\) by demonstrating the contrapositive \(f(\phi) \in \CLIQUE \implies \phi \in \TSAT\). (Note that we are not inverting the function \(f\) here. Rather, the contrapositive is stating that if the graph and budget produced as the output of \(f\) are in \(\CLIQUE\), then the input formula must have been in \(\TSAT\).) Since \(f(\phi) = (G, m) \in \CLIQUE\), the output graph \(G\) has a clique of size \(m\). Denote the set of vertices comprising the clique as \(C = \{c_1, \dots, c_m\}\). Then:
Since \(C\) is a clique, there is an edge between any pair of vertices \(c_i, c_j\) in \(C\).
The algorithm \(f\) only places an edge between vertices if the corresponding literals are from different clauses. Thus, the \(m\) vertices in \(C\) must come from the \(m\) different clauses of the input formula \(\phi\).
The algorithm \(f\) only places an edge between vertices if their corresponding literals can be true simultaneously. Since there is an edge between every pair of vertices in \(C\), all of their corresponding literals can be true at the same time.
We can define an assignment \(\alpha\) of \(\phi\)’s variables by setting them such that each literal corresponding to the vertices in \(C\) becomes true. Since all the literals can be true simultaneously, there is no conflict when defining this assignment. (Any variables that remain unset after this process can be set arbitrarily.)
Then \(\alpha\) is a satisfying assignment – each literal corresponding to the vertices in \(C\) is from a different clause, and they are all true, so each of the \(m\) clauses has a true literal and is therefore satisfied. Thus, \(\phi\) is satisfiable.
Since \(\phi \in \TSAT \iff f(\phi) \in \CLIQUE\) and \(f\) can be computed efficiently, we have demonstrated that \(\TSAT \pmr \CLIQUE\). Since \(\TSAT\) is \(\NP\)-Hard, we conclude that \(\CLIQUE\) is \(\NP\)-Hard, and since \(\CLIQUE \in \NP\), it is also \(\NP\)-Complete.
Vertex Cover
Given an undirected graph \(G = (V, E)\), a vertex cover is a subset of the vertices \(C \subseteq V\) such that every edge in \(E\) is covered by a vertex in \(C\). An edge \(e = (u, v)\) is covered by \(C\) if at least one of the endpoints \(u,v\) is in \(C\). In other words, \(C\) is a vertex cover for \(G = (V, E)\) if \(C \subseteq V\) and
The following are two vertex covers for the same graph:
The cover on the left has a size of five vertices, while the cover on the right has a size of four.
As a motivating example of the vertex-cover problem, consider a museum structured as a set of interconnected hallways, with exhibits on the walls. The museum needs to hire guards to ensure the security of the exhibits, but guards are expensive, so it seeks to minimize the number of guards hired while still ensuring that each hallway is covered by a guard. The museum layout can be represented as a graph with the hallways as edges and the vertices as hallway endpoints. Then the museum just needs to figure out what the minimal vertex cover is to determine how many guards to hire (or security cameras/motion detectors to install) and where to place them.
As we did for the traveling salesperson problem, we define a limited-budget, decision version of vertex cover:
Here, we have defined \(\VC\) with an exact budget \(k\) rather than an upper bound – observe that if a graph \(G = (V, E)\) has a vertex cover \(C\) of size \(m\), then it has one for any size \(m \le k \le \abs{V}\) by adding arbitrary vertices from \(V \setminus C\) into \(C\). Put in museum terms, if the museum’s budget allows, it can hire more guards than are strictly necessary to cover all hallways.
We now demonstrate that \(\VC\) is \(\NP\)-Hard, leaving the proof that \(\VC \in \NP\) as an exercise. Combining the two, we will be able to conclude that \(\VC\) is \(\NP\)-Complete.
Show that \(\VC \in \NP\).
We show that \(\VC\) is \(\NP\)-Hard via the reduction \(\TSAT \pmr \VC\). To do so, we must define a polynomial-time computable function \(f\) that converts a 3CNF formula \(\phi\) into an instance \((G, k)\) of the vertex-cover problem. More specifically, we require:
\(\phi \in \TSAT \implies f(\phi) \in \VC\)
\(\phi \notin \TSAT \implies f(\phi) \notin \VC\)
Given a 3CNF formula \(\phi\) with \(n\) variables and \(m\) clauses, we construct a graph \(G\) corresponding to that formula. The graph is comprised of gadgets that are subgraphs, representing individual variables and clauses. We will then connect the gadgets together such that the resulting graph \(G\) has a vertex cover of size \(n + 2m\) exactly when \(\phi\) is satisfiable.
The following are our gadgets for variables and clauses:
A gadget for variable \(x\) is a “barbell,” consisting of two vertices labeled by \(x\) and \(\neg x\) connected by an edge. A gadget for a clause \(x \vee y \vee z\) is a triangle, consisting of a vertex for each literal, with the three vertices all connected together.
As a concrete example, consider the formula
We start building the corresponding graph \(G\) by incorporating gadgets for each clause and variable:
Here, we have placed the variable gadgets at the top of the figure and the clause gadgets at the bottom.
The next step in the construction is to connect each vertex in a variable gadget to the vertices in the clause gadgets that have the same literal as the label. For instance, we connect \(x\) in the respective variable gadget to each vertex labeled by \(x\) in the clause gadgets, \(\neg x\) in the variable gadget to each vertex labeled by \(\neg x\) in the clause gadgets, and so on. The result is as follows:
This completes the construction of \(G\), and we have \(f(\phi) = (G, n+2m)\). There are \(3m + 2n\) vertices in \(G\), \(n\) edges within the variable gadgets, \(3m\) edges within the clause gadgets, and \(3m\) edges that cross between the clause and variable gadgets, for a total of \(\O(m + n)\) edges. Thus, \(G\) has size \(\O(n + m)\), and it can be constructed efficiently.
We proceed to show that if \(\phi \in \TSAT\), then \(f(\phi) = (G, n+2m) \in \VC\). Since \(\phi \in \TSAT\), there is some satisfying assignment \(\alpha\) such that \(\phi(\alpha)\) is true. Then \(C\) is a vertex cover for \(G\), where:
For the variable gadget corresponding to each variable \(x\), \(x\) is in \(C\) if \(\alpha(x) = 1\), otherwise \(\neg x\) is in \(C\) (i.e. if \(\alpha(x) = 0\)).
Observe that every edge internal to a variable gadget is covered. Furthermore, since \(\alpha\) is a satisfying assignment, each clause has at least one literal that is true. Then the edge connecting the vertex in the corresponding clause gadget to the same literal in a variable gadget is already covered – the vertex corresponding to that literal in the variable gadget is in \(C\).
Thus, every clause gadget has at least one out of its three “crossing” edges covered by a vertex in a variable gadget, and so far, we have \(n\) vertices in \(C\).
For each uncovered crossing edge \((u, v)\), where \(u\) is in a variable gadget and \(v\) is in a clause gadget, we have \(v \in C\). There are at most two such crossing edges per clause. If a clause has only one uncovered crossing edge, then pick another arbitrary vertex in the clause to be in \(C\). Similarly, if a clause has no uncovered crossing edges, then pick two of its vertices arbitrarily to be in \(C\).
Now all crossing edges are covered by a vertex in a variable gadget, or one in a clause gadget, or both. Furthermore, since we pick exactly two vertices out of each clause gadget, the edges internal to each clause gadget are also covered. Thus, all edges are covered, and \(C\) is a vertex cover. This second step adds two vertices per clause, or \(2m\) vertices. Combined with the previous vertices, \(C\) has \(n + 2m\) vertices, and we have shown that \(G\) has a vertex cover of size \(n + 2m\).
We now show that \(\phi \notin \TSAT \implies (G, n+2m) \notin \VC\). More precisely, we demonstrate the contrapositive \((G, n+2m) \in \VC \implies \phi \in \TSAT\). In other words, if \(G\) has a vertex cover of size \(n + 2m\), then \(\phi\) has a satisfying assignment.
A vertex cover of \(G\) of size \(n + 2m\) must include the following:
exactly one vertex from each variable gadget to cover the edge internal to the variable gadget
exactly two vertices from each clause gadget to cover the edges internal to the clause gadget
This brings us to our total of \(n + 2m\) vertices in the cover. While all internal edges are covered, we must also consider the edges that cross between clause and variable gadgets. With the two vertices from each clause gadget, only two of the three crossing edges are covered by those vertices. The remaining crossing edge must be covered by a vertex from a clause gadget. Thus, if \(G\) has a vertex cover \(C\) of size \(n + 2m\), each clause \(k\) has a crossing edge \((u_l, v_k)\) where:
\(u_l\) is a variable vertex with label \(l\) for some literal \(l\)
\(v_k\) is a vertex in the clause gadget with the same label \(l\)
\(u_l \in C\), so that the crossing edge is covered
Then the assignment \(\alpha\) such that \(\alpha(l) = 1\) if \(u_l \in C\) is a satisfying assignment – for each clause \(k\), we have the true literal \(l\) corresponding to the vertex \(v_k\). Since all clauses are satisfied, \(\phi\) as a whole is satisfied.
This completes our proof that \(\TSAT \pmr \VC\). We conclude from the facts that \(\TSAT\) is \(\NP\)-Hard and \(\VC \in \NP\) that \(\VC\) is \(\NP\)-Complete.
Define the language
\[\IS = \{(G, k) : G \text{ is an undirected graph with an independent set of size } k\}\]An independent set of a graph \(G = (V, E)\) is a subset of the vertices \(I \subseteq V\) such that no two vertices in \(I\) share an edge. Show that \(\VC \pmr \IS\).
Hint: If \(G\) has a vertex cover of size \(m\), what can be said about the vertices that are not in the cover?
Recall the language
\[\CLIQUE = \{(G, k) : G \text{ is an undirected graph that has a clique of size $k$}\}\]Show that \(\IS \pmr \CLIQUE\).
Set Cover
Consider another problem, that of hiring workers for a project. To successfully complete the project, we need workers with some combined set of skills \(S\). For instance, we might need an architect, a general contractor, an engineer, an electrician, a plumber, and so on for a construction project. We have a candidate pool of \(n\) workers, where worker \(i\) has a set of skills \(S_i\). As an example, there may be a single worker who is proficient in plumbing and electrical work. Our goal is to hire the minimal team of workers to cover all the required skills.
We formalize this problem as follows. We are given a set of elements \(S\), representing the required skills. We are also given \(n\) subsets \(S_i \subseteq S\), representing the set of skills that each worker \(i\) has. We want to find the smallest collection of \(S_i\)’s that cover the set \(S\). In other words, we wish to find the smallest set of indices \(C\) such that
As a concrete example, let \(S\) and \(S_1, \dots, S_6\) be as follows:
There are no set covers of size two, but there are several with size three. One example is \(C = {1, 2, 6}\), which gives us
We proceed to define a language corresponding to the set-cover problem. As with vertex cover, we include a budget \(k\) for the size of the cover \(C\).
It is clear that \(\SC \in \NP\) – we can use as a certificate a set of indices \(C\), and the verifier need only check that \(C\) is a size-\(k\) subset of \([1, n]\), and that \(S = \cup_{i\in C} S_i\). This can be done efficiently.
We now perform the reduction \(\VC \pmr \SC\) to demonstrate that \(\SC\) is \(\NP\)-Hard. We need to convert an instance \((G = (V, E), k)\) to an instance \((S, S_1, \dots, S_n, k')\) such that \(G\) has a vertex cover of size \(k\) exactly when \(S\) has a set cover of size \(k'\). Observe that an element of a vertex cover is a vertex \(v\), and it covers some subset of the edges in \(E\). Similarly, a set cover consists of sets \(S_i\), each of which cover some subset of the items in \(S\). Thus, an edge corresponds to a skill, while a vertex corresponds to a worker.
Given a graph \(G = (V, E)\) and a budget \(k\), we define the function \(f\) such it translates the set of edges \(E\) to the set of skills \(S\), and each vertex \(v_i \in V\) to a set \(S_i\) consisting of the edges incident to \(v_i\). Formally, we have \(f(G, k) = (S, S_1, \dots, S_{\abs{V}}, k)\), where \(S = E\) and:
This transformation can be done efficiently, as it is a direct translation of the graph \(G\).
As an example, suppose that the graph \(G\) is as follows:
This graph has a cover of size four consisting of the vertices labeled \(S_2, S_4, S_5, S_6\). The transformation \(f(G, k)\) produces the following sets:
Then \(C = \{2, 4, 5, 6\}\) is a set cover of size four.
We now demonstrate that in the general case, \((G = (V, E), k) \in \VC\) exactly when \(f(G, k) = (S, S_1, \dots, S_{\abs{V}}, k) \in \SC\).
If \((G, k) \in \VC\), then \(G\) has a vertex cover \(C \subseteq V\) of size \(k\) that covers all the edges in \(E\). This means that
\[E = \bigcup_{v_i \in C} \{e \in E : e \text{ is incident to } v_i\}\]Let \(C' = \{i : v_i \in C\}\). Then we have
\[\begin{split}\bigcup_{i\in C'} S_i &= \bigcup_{i\in C'} \{e \in E : e \text{ is incident to } v_i\}\\ &= E = S\end{split}\]Thus, \(C'\) covers \(S\), and \(S\) has a set cover of size \(k\).
If \((G, k) \notin \VC\), then \(G\) does not have a set cover of size \(k\). Either \(k > \abs{V}\), or for any subset of vertices \(C \subseteq V\), of size \(k\), we have
\[E \ne \bigcup_{v_i \in C} \{e \in E : e \text{ is incident to } v_i\}\]Then for any subset \(C' \subseteq \{S_1, \dots, S_{\abs{V}}\}\) of size \(k\), we also have
\[\begin{split}\bigcup_{i\in C'} S_i &= \bigcup_{i\in C'} \{e \in E : e \text{ is incident to } v_i\}\\ \ne E = S\end{split}\]Thus \(S\) does not have a set cover of size \(k\).
This completes our reduction \(\VC \pmr \SC\).
We conclude our discussion of set cover by observing that a vertex cover is actually a special case of a set cover. When we transform an instance of the vertex-cover problem to one of set cover, each edge in a graph is interpreted as a skill and each vertex as worker, which means that each skill is shared by exactly two workers. This is more restrictive then the general case for set cover, where any number of workers (including none) may have a particular skill. The reduction \(\VC \pmr \SC\) just exemplifies that if we can solve the general case of a problem efficiently, we can also solve special cases efficiently. Conversely, if a special case is “hard,” then so is the general case [6].
Since both \(\VC\) and \(\SC\) are \(\NP\)-Complete, they are actually equally hard, so an instance of either can be converted to an instance of the other. However, the conversion of an instance of set cover to vertex cover is not as straightforward as that of vertex cover to set cover.
Hamiltonian Cycle
Given a graph \(G\), a Hamiltonian path from \(s\) to \(t\) is a path that starts at \(s\), ends at \(t\), and visits each vertex exactly once. A Hamiltonian cycle is a path that starts and ends at the same vertex and visits all other vertices exactly once. For example, \(G_1\) below has a Hamiltonian cycle as well as a Hamiltonian path between \(s_1\) and \(t\), but it does not have a Hamiltonian path between \(s_2\) and \(t\). On the right, \(G_2\) has a Hamiltonian path between \(s\) and \(t\), but it does not have a Hamiltonian cycle.
We define the languages corresponding to the existence of a Hamiltonian path or cycle as:
Both languages are in \(\NP\) – we take the actual Hamiltonian path or cycle as a certificate, and a verifier need only follow the path to check that it is valid as in the verifier for TSP. We claim without proof that \(\HP\) is \(\NP\)-Hard – it is possible to reduce \(\TSAT\) to \(\HP\), but the reduction is quite complicated, so we will not consider it here. Instead, we perform the reduction \(\HP \pmr \HC\) to show that \(\HC\) is also \(\NP\)-Hard. We need to demonstrate an efficient function \(f\) such that \((G, s, t) \in \HP\) if and only if \(f(G, s, t)\) has a Hamiltonian cycle.
Our first attempt is to produce a new graph \(G'\) that is the same as \(G\), except that it has an additional edge between \(s\) and \(t\). Then if \(G\) has a Hamiltonian path from \(s\) to \(t\), \(G'\) has a Hamiltonian cycle that follows that same path from \(s\) to \(t\) and returns to \(s\) through the additional edge. While this construction works in mapping yes instances of \(\HP\) to yes instances of \(\HC\), it does not properly map no instances of \(\HP\) to no instances of \(\HC\). The following illustrates the construction on both a no and a yes instance of \(\HP\), with the dotted line representing the added edge:
In the original graph on the left, there is no Hamiltonian path from \(s_2\) to \(t\), yet the new graph does have a Hamiltonian cycle. Since the function maps a no instance to a yes instance, it does not demonstrate a valid mapping reduction.
The key problem with the transformation above is that it does not force a Hamiltonian cycle to go through the additional edge. If the cycle did go through the additional edge, we could remove that edge from the cycle to obtain a Hamiltonian path in the original graph. To force the cycle through the additions to the graph, we need to add a new vertex, not just an edge. Thus, in place of the single edge, we add two edges with a vertex between them, as in the following:
Now the generated graph on the left does not have a Hamiltonian cycle, but the new graph on the right still does. Thus, this procedure looks like it should work to map yes instances to yes instances and no instances to no instances.
Formally, given \((G = (V, E), s, t)\), we have
The function is clearly efficient, as it just adds one edge and two vertices to the input graph. We prove that it is also correct:
Suppose \(G\) has a Hamiltonian path from \(s\) to \(t\). This path visits all the vertices in \(G\), and it has the structure \((s,v_2,\dots,v_{n-1},t)\). The same path exists in \(G'\), and it visits all vertices except the added vertex \(u\). The modified path \((s,v_2,\dots,v_{n-1},t,u,s)\) with the additions of the edges \((t,u)\) and \((u,s)\) visits all the vertices and returns to \(s\), so it is a Hamiltonian cycle, and \(G' \in \HC\).
Suppose \(G'\) has a Hamiltonian cycle. Since it is a cycle, we can start from any vertex in the cycle and return back to it. Consider the specific path that starts at and returns to \(s\). It must also go through \(u\) along the path, so it has the structure \((s,\dots,u,\dots,s)\). Since the only edges incident to \(u\) are \((s,u)\) and \((t,u)\) the path must either enter \(u\) from \(t\) and exit to \(s\), or it must enter from \(s\) and exit to \(t\). We consider the two cases separately.
Case 1: \(t\) precedes \(u\). Then the path must have the form \((s,\dots,t,u,s)\). Since the subpath \((s,\dots,t)\) visits all vertices except \(u\) and only includes edges from \(G\), it constitutes a Hamiltonian path in \(G\) from \(s\) to \(t\).
Case 2: \(t\) follows \(u\). Then the path has the form \((s,u,t,\dots,s)\). Since the subpath \((t,\dots,s)\) visits all vertices except \(u\) and only includes edges from \(G\), it constitutes a Hamiltonian path in \(G\) from \(t\) to \(s\). The graph is undirected, so following the edges in reverse order gives a Hamiltonian path from \(s\) to \(t\).
Thus, \(G\) has a Hamiltonian path from \(s\) to \(t\).
We have shown that \(G\) has a Hamiltonian path from \(s\) to \(t\) exactly when \(G' = f(G, s, t)\) has a Hamiltonian cycle. This demonstrates that \(f\) is a valid mapping from instances of \(\HP\) to instances of \(\HC\), so \(\HP \pmr \HC\). We conclude that \(\HC\) is \(\NP\)-Hard.
Define the language
A simple path is a path without any cycles. Show that \(\mathrm{LONG\text{-}PATH}\) is \(\NP\)-Complete.
Recall the \(\TSP\) language:
Show that \(\TSP\) is \(\NP\)-Hard with the reduction \(\HC \pmr \TSP\).
Subset Sum
The subset-sum problem is the task of finding a subset of set of integers \(S\) that add up to a target value \(k\). The decision version of this problem is as follows:
This language is in \(\NP\) – we can take the actual subset \(S^* \subseteq S\) as a certificate, and a verifier can efficiently check that it is a valid subset of \(S\) and that its elements sum to the value \(k\). We show \(\TSAT \pmr \SSUM\) to demonstrate that \(\SSUM\) is also \(\NP\)-Hard: we define a polytime computable function \(f\) that takes a 3CNF formula \(\phi\) with \(n\) variables and \(m\) clauses and produces a set of integers \(S\) and target value \(k\) such that
A satisfying assignment \(\alpha\) of \(\phi\) has two key properties:
for each variable \(x_i \in \phi\), exactly one of \(\alpha(x_i)\) or \(\alpha(\neg x_i)\) is 1
for each clause \(C_j = (l_{j,1} \vee l_{j,2} \vee l_{j,3}) \in \phi\), at least one of \(\alpha(l_{j,1}), \alpha(l_{j,2}), \alpha(l_{j,3})\) is 1
The set of integers \(S\) and value \(k\) produced by \(f(\phi)\) are carefully designed to correspond to these properties. Specifically, the set \(S\) consists of decimal (base-10) integers that are \(n+m\) digits long – the first \(n\) digits correspond to the \(n\) variables, while the last \(m\) digits correspond to the clauses. For each variable \(x_i\), \(S\) contains two numbers \(v_i\) and \(w_i\):
\(v_i\) is 1 in digit \(i\) and in digit \(n+j\) for each clause \(C_j\) that contains the literal \(x_i\), 0 in all other digits
\(w_i\) is 1 in digit \(i\) and in digit \(n+j\) for each clause \(C_j\) that contains the literal \(\neg x_i\), 0 in all other digits
We also set the \(i\)th digit of the target value \(k\) to be 1 for \(1 \le i \le n\). For a subset \(S^*\) to add up to \(k\), it must contain exactly one of \(v_i\) or \(w_i\) for each \(1 \le i \le n\), since only those two numbers contain a 1 in the \(i\)th digit. (Since we are using decimal numbers, there aren’t enough numbers in \(S\) to get a 1 in the \(i\)th digit through a carry from less-significant digits.) Thus, a subset \(S^*\) that adds up to \(k\) corresponds to an actual assignment of values to the variables, enforcing the first property above.
For a clause digit \(n+j, 1 \le j \le m\), the subset \(S^*\) can produce one of the following values corresponding to the clause \(C_j = (l_{j,1} \vee l_{j,2} \vee l_{j,3})\):
0 if \(S^*\) contains none of the numbers corresponding to \(l_{j,1}, l_{j,2}, l_{j,3}\) being true (e.g. if \(l_{j,1} = x_i\), then \(S^*\) does not contain \(v_i\), and if \(l_{j,1} = \neg x_i\), then \(S^*\) does not contain \(w_i\))
1 if \(S^*\) contains exactly one number corresponding to \(l_{j,1}, l_{j,2}, l_{j,3}\) being true
2 if \(S^*\) contains exactly two numbers corresponding to \(l_{j,1}, l_{j,2}, l_{j,3}\) being true
3 if \(S^*\) contains all three numbers corresponding to \(l_{j,1}, l_{j,2}, l_{j,3}\) being true
For the clause \(C_j\) to be satisfied by the assignment corresponding to \(S^*\), all but the first case work. We need to add more numbers to \(S\) and set digit \(n+j\) of \(k\) such that the three latter cases are all viable. To do so, for each clause \(C_j\):
set the value for digit \(n+j\) to be 3 in the target number \(k\)
add two numbers \(y_j\) and \(z_j\) to \(S\) that have a 1 in digit \(n+j\) and 0 in all other digits
This completes our construction of \(S\) and \(k\). This can be done efficiently – we generate \(2n+2m+1\) numbers, each of which is \(n+m\) digits long, and we can do so in quadratic time with respect to \(\abs{\phi} = \O(n+m)\). For a subset \(S^* \subseteq S\) to add up to \(k\), it must contain:
exactly one of \(v_i\) or \(w_i\) for each variable \(x_i, 1 \le i \le n\), corresponding to an assignment to the variables
none, one, or both of \(y_j\) and \(z_j\) for each clause \(C_j, 1 \le j \le m\), depending on whether the clause \(C_j\) has three, two, or one literal(s) satisfied by the corresponding assignment
As a concrete example, consider the formula
We generate the following numbers, with columns corresponding to digits and unspecified digits implicitly 0:
\(~\) |
\(x_1\) |
\(x_2\) |
\(x_3\) |
\(x_4\) |
\(C_1\) |
\(C_2\) |
\(C_3\) |
---|---|---|---|---|---|---|---|
\(v_1\) |
1 |
1 |
|||||
\(v_2\) |
1 |
1 |
1 |
||||
\(v_3\) |
1 |
1 |
|||||
\(v_4\) |
1 |
1 |
|||||
\(w_1\) |
1 |
1 |
|||||
\(w_2\) |
1 |
1 |
|||||
\(w_3\) |
1 |
1 |
|||||
\(w_4\) |
1 |
1 |
|||||
\(y_1\) |
1 |
||||||
\(y_2\) |
1 |
||||||
\(y_3\) |
1 |
||||||
\(z_1\) |
1 |
||||||
\(z_2\) |
1 |
||||||
\(z_3\) |
1 |
||||||
\(k\) |
1 |
1 |
1 |
1 |
3 |
3 |
3 |
Consider the satisfying assignment \(\alpha(x_1, x_2, x_3, x_4) = (0, 1, 0, 1)\). The corresponding subset \(S^*\) contains \(w_1, v_2, w_3, v_4\), which add up to \(1111113\) – clauses \(C_1\) and \(C_2\) are satisfied by one literal each (\(x_2\) and \(\neg x_3\), respectively), while clause \(C_3\) is satisfied by all three literals. To add up to \(k = 1111333\), \(S^*\) also contains \(y_1, z_1, y_2, z_2\).
Now consider the assignment \(\alpha'(x_1, x_2, x_3, x_4) = (0, 0, 0, 1)\) that does not satisfy the formula. The corresponding subset \(S^*\) contains \(w_1, w_2, w_3, v_4\), which add up to \(1111022\). The highest value we can get in digit 5 (corresponding to clause \(C_1\)) is 2, by adding both \(y_1\) and \(z_1\) to the subset. This does not get us to the target value of 3.
Formally, for an arbitrary formula \(\phi\) with \(n\) variables and \(m\) clauses and resulting \(f(\phi) = (S, k)\), we have:
If \(\phi \in \TSAT\), then it has an assignment \(\alpha\) that satisfies all clauses, meaning that at least one literal in each clause is true. The subset
\[S_a^* = \{v_i : \alpha(x_i) = 1\} \cup \{w_i : \alpha(x_i) = 0\}\]sums to 1 in each of the first \(n\) digits, and at least 1 in each of the last \(m\) digits – each clause \(C_j\) is satisfied by at least one literal, and the digit \(n+j\) is set to 1 in the number corresponding to that literal in the subset \(S_a^*\). Thus, the full subset
\[\begin{split}S^* =~ &S_a^* ~\cup\\ &\{y_j : \phi(\alpha) \text{ sets $\le 2$ literals of $C_j$ to $1$}\} ~\cup\\ &\{z_j : \phi(\alpha) \text{ sets $1$ literal of $C_j$ to $1$}\}\end{split}\]adds up to the target value \(k\), so \(f(\phi) = (S, k) \in \SSUM\).
If \(f(\phi) = (S, k) \in \SSUM\), then there is some subset \(S^*\) of the numbers \(S = \{v_i\} \cup \{w_i\} \cup \{y_j\} \cup \{z_j\}\) that adds up to \(k\). This subset \(S^*\) must contain exactly one of \(v_i\) or \(w_i\) for each \(1 \le i \le n\) so that the \(i\)th digit in the sum is the target 1. This corresponds to the assignment
\[\begin{split}\alpha(x_i) = \begin{cases} 1 &\mbox{ if $v_i \in S^*$}\\ 0 &\mbox{ if $w_i \in S^*$} \end{cases}\end{split}\]For each digit \(n+j, 1 \le j \le m\), there must be some \(v_i\) or \(w_i\) in \(S^*\) that has that digit set to 1 – there are only two numbers among the \(y_j\) and \(z_j\) that have this digit set to 1, so they cannot add up to the target value of 3 on their own. If there is a \(v_i \in S^*\) that has the \((n+j)\)th digit set to 1, then by construction, the clause \(C_j\) contains the literal \(x_i\), so it is satisfied by the assignment \(\alpha\) since \(\alpha(x_i) = 1\). On the other hand, if there is a \(w_{i'} \in S^*\) that has the \((n+j)\)th digit set to 1, then the clause \(C_j\) contains the literal \(\neg x_{i'}\), and it is satisfied by the assignment \(\alpha\) since \(\alpha(x_{i'}) = 0\). In either case, the clause \(C_j\) is satisfied. Since this reasoning applies to all clauses, the assignment \(\alpha\) satisfies the formula as a whole, and \(\phi \in \TSAT\).
We have shown that \(f\) is efficiently computable and that \(\phi \in \TSAT \iff f(\phi) \in \SSUM\), so we have \(\TSAT \pmr \SSUM\). We can conclude that \(\SSUM\) is \(\NP\)-Hard, and since it is also in \(\NP\), that it is \(\NP\)-Complete.
Define the language \(\KNAPSACK\) as follows:
\[\begin{split}\KNAPSACK = \left\{\begin{aligned} (V,W,n,v,C) : ~&\abs{V} = \abs{W} = n \text{ and there exist indices }\\ &i \in 1, \dots, n \text{ such that } \sum_i V[i] \ge v \andt\\ &\sum_i W[i] \le C \end{aligned}\right\}\end{split}\]This corresponds to the following problem:
There are \(n\) items.
Each item \(i\) has a value \(V(i)\) and a weight \(W(i)\), where \(V(i)\) and \(W(i)\) are integers.
We have a target value of at least \(v\), where \(v\) is an integer.
We have a weight capacity limit of at most \(C\), where \(C\) is an integer.
Is it possible to select a subset of the items such that the total value of the subset is at least \(v\) but the total weight of the subset is at most \(C\)?
Show that \(\KNAPSACK\) is \(\NP\)-Complete.
Concluding Remarks
We have explored only the tip of the iceberg when it comes to \(\NP\)-Complete problems. Such problems are everywhere, including constraint satisfaction (\(\SAT\), \(\TSAT\)), covering problems (vertex cover, set cover), resource allocation (knapsack, subset sum), scheduling, graph colorability, model-checking, social networks (clique, maximal cut), routing (\(\HP\), \(\TSP\)), games (Sudoku, Battleship, Super Mario Brothers, Pokémon), and so on. An efficient algorithm for any of these problems admits an efficient solution for all of them.
The skills to reason about a problem and determine whether it is \(\NP\)-Hard are critical. Researchers have been working for decades to find an efficient solution for \(\NP\)-Complete problems to no avail. This makes it exceedingly unlikely that we will be able to do so. Instead, when we encounter such a problem, a better path is to focus on approximation algorithms, which we turn to next.
Search and Approximation
Thus far, we have focused our attention on decision problems, formalized as languages. We now turn to functional or search problems, those that do not have a yes/no answer. Restricting ourselves to decision problems does not leave us any room to find an answer that is “close” to correct, but a larger codomain will enable us to consider approximation algorithms for \(\NP\)-Hard problems. We previously encountered Kolmogorov complexity as an example of an uncomputable functional problem. Here, we consider problems that are computable, with the intent of finding efficient algorithms to solve such problems. Some examples of search problems are:
Given an array of integers, produce a sorted version of that array.
Given a Boolean formula, find a satisfiable assignment.
Given a graph, find a minimal vertex cover.
In general, a search problem may be:
a maximization problem, such as finding a maximal clique in a graph
a minimization problem, such as finding a minimal vertex cover
an exact problem, such as finding a satisfying assignment or a Hamiltonian path
Maximization and minimization problems are also called optimization problems.
For each kind of search problem, we can define a corresponding decision problem [7]. For maximization and minimization, we introduce a limited budget. Specifically, does a solution exist that meets the budget? The following are examples:
For an exact problem, the corresponding decision problem is whether a solution exists that meets the required criteria. The following is an example:
We have seen that the languages \(\CLIQUE\), \(\VC\), and \(\SAT\) are \(\NP\)-Complete. How do the search problems compare in difficulty to the decision problems?
We previously saw that a functional problem has an equivalent formulation as a decision problem. However, this correspondence was based on translating a functional problem to a sequence of decision problems on the binary representation of the answer. This is different than what we are considering now, which is between a search problem and a “natural” formulation of a corresponding decision problem.
It is clear that given an efficient algorithm to compute the answer to a search problem, we can efficiently decide the corresponding language. For instance, a decider for \(\CLIQUE\) can invoke the algorithm to find a maximal clique and then check whether the result has size at least \(k\). Thus, the decision problem is no harder than the search problem.
What if we have an efficient decider \(D\) for \(\CLIQUE\)? Can we then efficiently find a maximal clique? It turns out that we can. Given a graph \(G = (V, E)\), we first find the size of a maximal clique by running \(D\) on \((G, k)\) for each \(k\) from \(\abs{V}\) down to \(0\), stopping on the first \(k\) for which \(D\) accepts:
Since a clique can at most contain all the vertices in a graph, we query the decider whether or not there is a clique of such a size. If not, we query again with a smaller size. We stop as soon as we get an affirmative answer. Thus, the algorithm produces the size of the largest clique. The algorithm is also efficient, as it makes as most \(\abs{V}\) calls to \(D\), and \(D\) is assumed to be efficient.
Note that a linear search is not always efficient to find the size of the optimal object; it depends on how large the search space is compared to the input size. For the maximal-clique problem, the search space is linear with respect to the input size, so a linear search suffices. For other problems, it may be the case that the search space is exponential compared to the input size, in which case a binary search may be necessary.
Once we know the size of the optimal clique, we can look for a clique of that size. There are two common strategies.
The first strategy is a “destructive” one, where we throw away pieces of the input until all we have left is the object we’re looking for. In the case of the clique problem, we can try discarding a vertex (and its incident) edges and then query the decider to see if the graph still has a clique of the required size. If so, the vertex is unnecessary, and we can proceed without it. Otherwise, the discarded vertex is part of the optimal clique, so we need to retain it. We can repeat this for all vertices, and at the end, we will be left with just those that make up the clique. The algorithm is as follows:
This algorithm does a single pass over all the vertices. Each iteration removes a vertex and its incident edges, which can be done efficiently, and invokes the decider \(D\), which is assumed to be efficient. Thus, the algorithm as a whole is efficient.
The second common strategy is a “constructive” one – we build up the solution piece by piece, guessing what is in the solution and checking the guess via a query to the decider. For the clique problem, we can guess that a vertex \(v\) is in a clique of size \(k\). If it is, then there must be a clique of size \(k-1\) among just the vertices that are adjacent to \(v\), i.e. its neighborhood. This is because a clique of size \(k\) is a complete subgraph of size \(k\), and if we remove a vertex from the subgraph, we have a complete subgraph of size \(k-1\). Conversely, if the neighborhood of \(v\) is a clique of size \(k-1\), then adding \(v\) produces a clique of size \(k\), since by definition, there is an edge between \(v\) and each vertex in its neighborhood. Thus, our algorithm iterates over each vertex and checks if its neighborhood has a clique of size one smaller than required. If so, the vertex is part of the larger clique, and we continue the search on just its neighborhood and with the smaller size. Otherwise, the vertex is not part of the clique we’re looking for, so we keep looking for a clique of the larger size. The algorithm is as follows:
This algorithm also does a single pass over the vertices in \(G\). Each iteration computes the neighborhood of a vertex, which can be done efficiently by in turn iterating over all the vertices and edges in the graph. Each iteration of the algorithm also invokes \(D\), which is assumed efficient. Since the algorithm does a linear number of iterations and each is efficient, the algorithm as a whole is also efficient.
Suppose we have an efficient decider \(D\) for \(\VC\). We can efficiently find a minimal vertex cover as follows. Given a graph \(G = (V, E)\), we do the following:
Find the size of a minimal vertex cover by running \(D\) on \((G, k)\) for each \(k\) from \(1\) to \(\abs{V}\), stopping on the first \(k\) for which \(D\) accepts.
We use a “constructive” strategy to deduce which vertices are part of the cover. Pick a vertex \(v\). If it is in a minimal vertex cover of size \(k\), we can remove it and all adjacent edges, and the resulting graph has a vertex cover of size \(k-1\). If \(v\) is not in a minimal vertex cover, the resulting graph will not have a vertex cover of size \(k-1\). Thus, we can use a call to \(D\) on the new graph to determine whether or not \(v\) is in a minimal cover. If it is, we continue our search on the smaller graph, otherwise we restore the previous graph. We continue the search until we’ve checked each vertex for whether it is in the minimal cover.
The full search algorithm is as follows:
The first loop invokes \(D\) at most \(\abs{V}\) times, since a cover is a subset of all the vertices. Since \(D\) is efficient, the loop completes in polynomial time. The second loop invokes \(D\) at most \(\abs{V}\) times, and \(G'\) can be constructed efficiently, so the loop as a whole is efficient. Thus, the algorithm runs in polynomial time with respect to \(\abs{G}\). We conclude that if we can decide \(\VC\) efficiently, we can find an actual minimal vertex cover efficiently.
Given an efficient decider \(D\) for \(\SAT\), demonstrate an efficient algorithm that finds a satisfying assignment for a Boolean formula \(\phi\), if such an assignment exists.
We have shown that the search problems of finding a maximal clique or minimal vertex cover has the same difficulty as the decision problems \(\CLIQUE\) or \(\VC\), respectively. Since \(\CLIQUE\) and \(\VC\) are \(\NP\)-Complete, this means that an efficient solution to either search problem leads to \(\P = \NP\). Informally, we say that the search problems are NP-Hard, much as we did for a decision problem for which an efficient solution implies \(\P = \NP\). (Note, however, that the search problems are not \(\NP\)-Complete, as the class \(\NP\) consists only of decision problems.)
We claim without proof that an \(\NP\)-Hard search problem has an efficient algorithm if and only if its corresponding decision problem is efficiently solvable. Thus, we are unlikely to construct efficient algorithms to find exact solutions to \(\NP\)-Hard search problems. Instead, we focus our attention on designing efficient algorithms to approximate \(\NP\)-Hard optimization problems.
Given an input \(x\), an optimization problem seeks to find an output \(y\) such that \(value(x, y)\) is optimized for some metric \(value\). For a maximization problem, the goal is to find an optimal value \(y^*\) such that:
That is, \(value(x, y^*)\) is at least as large as \(value(x, y)\) for all other outputs \(y\), so that \(y^*\) maximizes the metric for a given input \(x\). Similarly, in a minimization problem, the goal is to find \(y^*\) such that:
Here, \(y^*\) produces the minimal value for the metric \(value(x, y)\) over all \(y\). In both cases, we define \(OPT(x) = value(x, y^*)\) as the optimal value of the metric for an input \(x\). We then define an approximation as follows:
Given an input \(x\), a solution \(y\) is an α-approximation if:
For a maximization problem,
\[\alpha\cdot OPT(x) \le value(x, y) \le OPT(x)\]where \(\alpha < 1\).
For a minimization problem,
\[OPT(x) \le value(x, y) \le \alpha\cdot OPT(x)\]where \(\alpha > 1\).
In both cases, \(\alpha\) is the approximation ratio.
The closer \(\alpha\) is to one, the better the approximation.
An α-approximation algorithm is an algorithm \(A\) such that for all inputs \(x\), the output \(A(x)\) is an α-approximation.
Approximation for Minimal Vertex Cover
Let us return to the problem of finding a minimal vertex cover. Can we design an efficient algorithm to approximate the minimal cover with a good approximation ratio \(\alpha\)? Our first attempt is the cover-and-remove algorithm, which repeatedly finds an edge, removes one of its endpoints, and removes all edges incident to that endpoint:
How well does this algorithm perform? Consider the following star-shaped graph:
On this graph, it is possible for the algorithm to pick the vertices on the outside of the star to remove rather than the one in the middle. The result would be a cover of size four, while the optimal cover has size one (just the vertex in the middle), giving an approximation ratio of 4 for this graph. However, larger star-shaped graphs can give rise to arbitrarily larger approximation ratios – there is no constant \(\alpha\) such that the cover-and-remove algorithm achieves a ratio of \(\alpha\) on all possible input graphs.
Observe that in the star-shaped graph, the vertex in the center has the largest degree, the number of edges that are incident to it. If we pick that vertex first, we cover all edges, achieving the optimal cover of just a single vertex for the star-shaped graph. This motivates a greedy strategy that repeatedly adds the vertex with the highest remaining degree to the cover:
While this algorithm works well for a star-shaped graph, there are other graphs for which it does not achieve an approximation within a constant \(\alpha\). Consider the following bipartite graph:
The optimal cover consists of the six vertices at the top, while the algorithm instead chooses the eight vertices at the bottom. The vertex at the bottom left has a degree of six, compared to the maximum degree of five among the top vertices, so the bottom-left vertex is picked first. Then the second vertex on the bottom has degree five, while the top vertices now have degree no more than four. Once the second vertex is removed, the vertex with highest degree is the third on the bottom, and so on until all the bottom vertices have been chosen for the cover. While the algorithm achieves a ratio of \(4/3\) on this graph, larger graphs can be constructed with a similar structure, with \(k\) vertices at the top and ~\(k \log k\) at the bottom. The algorithm produces a cover with ratio \(\log k\) compared to the optimal, and that ratio can be made arbitrarily large by increasing \(k\).
The problem with both the cover-and-remove and greedy algorithms is the absence of a benchmark, a way of comparing the algorithm against the optimal result. Before we proceed to design another algorithm, we establish a measure for a minimal vertex cover that we can benchmark against, using a set of pair-wise disjoint edges from the input graph.
Given a graph \(G = (V, E)\), a subset of edges \(S \subseteq E\) is pair-wise disjoint if no two edges in \(S\) share a common vertex. Given any pair-wise disjoint set of edges \(S\) from \(G\), any vertex cover \(C\) must have at least \(\abs{S}\) vertices – since no two edges in \(S\) share a common vertex, we must choose a distinct vertex to cover each edge in \(S\). This gives us a way of benchmarking an approximation algorithm – ensure that the algorithm picks no more than a constant number of vertices for each edge in some pair-wise disjoint set of edges \(S\).
In fact, only a small modification to the cover-and-remove algorithm is necessary to meet this benchmark. Rather than choosing one of the two endpoints when we pick an edge, we can include both of them in the cover, removing the two endpoints and all their incident edges before proceeding. The remaining edges do not share a vertex with the edge we picked, so they are each disjoint with respect to that edge. By induction, the full set of edges picked by this double-cover algorithm is pair-wise disjoint. The algorithm is as follows:
The edges picked by the double-cover algorithm are pair-wise disjoint for any input graph.
Let \(S_i\) be the set of edges picked by the double-cover algorithm in the first \(i\) iterations on an arbitrary graph \(G\). We show by induction that \(S_i\) is pair-wise disjoint.
Base case: \(S_1\) contains only a single edge, so it is trivially pair-wise disjoint.
Inductive step: Assume \(S_i\) is pair-wise disjoint. When the algorithm picks an edge \((u,v)\), it removes both \(u\) and \(v\) and all edges incident to either \(u\) or \(v\). Thus, the graph that remains after iteration \(i\) does not contain any of the vertices \(u,v\) for each edge \((u,v) \in S_i\). The edges that remain in the graph all have endpoints that are distinct from those that appear in \(S_i\), so all remaining edges are disjoint from any edge in \(S_i\). Then no matter what edge \(e\) the algorithm picks next, it will be disjoint from any edge in \(S_i\). Since \(S_i\) itself is pair-wise disjoint, \(S_{i+1} = S_i \cup \{e\}\) is also pair-wise disjoint.
How good is this algorithm? Let \(S\) be the full set of edges picked by the algorithm on an input graph \(G\). Since \(S\) is pair-wise disjoint, the minimal vertex cover \(C^*\) of \(G\) must have at least \(\abs{S}\) vertices. The cover \(C\) produced by the double-cover algorithm has \(2\abs{S}\) vertices. We have:
Thus, the resulting cover is a 2-approximation, and double-cover is a 2-approximation algorithm.
Maximal Cut
As another example, we design an approximation algorithm for the \(\NP\)-Hard problem of finding the maximal cut of an undirected graph. A cut of a graph \(G = (V, E)\) is a partition of the vertices \(V\) into two disjoint sets \(S\) and \(T\), so that \(T = V \setminus S\). The size of a cut is the number of edges that cross between the partitions \(S\) and \(T\). More formally, we define the set of edges that cross between \(S\) and \(T\) as:
Then the size of the cut \(S,T\) is \(\abs{C(S, T)}\). The following is an example of a cut, with the crossing edges \(C(S, T)\) represented as dashed lines:
The maximal cut of a graph is the cut \(S,T\) that maximizes the size \(\abs{C(S, T)}\). Finding a maximal cut has important applications to circuit design, combinatorial optimization, and statistical physics.
The limited-budget decision version of this problem is defined as:
This is an \(\NP\)-Complete problem, though we do not prove it here. This implies that the search problem of finding a maximal cut is \(\NP\)-Hard. Rather than trying to find an exact solution to the search problem, we design an approximation algorithm. We start by identifying an appropriate benchmark.
Given a graph \(G = (V, E)\), we define the set \(I_v\) of edges incident to vertex \(v \in V\) as:
In other words, \(I_v\) is the set of edges that have \(v\) as an endpoint. Let \(S^*,T^*\) be a maximal cut of \(G\). We claim the following:
For each vertex \(v \in V\) in a graph \(G = (V, E)\), at least half the edges in \(I_v\) must be in the crossing set \(C(S^*, T^*)\).
Assume for the purposes of contradiction that fewer than half the edges in \(I_v\) are in \(C(S^*, T^*)\) for some vertex \(v \in V\). Without loss of generality, assume that \(v \in S^*\). Then the edges \(I_v\) can be partitioned into two subsets:
\(I_{v,c} \subseteq C(S^*, T^*)\) consists of edges that cross between \(S^*\) and \(T^*\), while \(I_{v,nc}\) consists of those that are internal to \(S^*\). By assumption, we have \(\abs{I_{v,c}} < \abs{I_{v,nc}}\).
If we move \(v\) from \(S^*\) into \(T^*\), the edges in \(I_{v,nc}\) become crossing edges, while those in \(I_{v,c}\) become internal edges, as illustrated below.
Thus, we have a net gain of \(\abs{I_{v,nc}} - \abs{I_{v,c}} > 0\) crossing edges, which means that the cut \(S^* \setminus \{v\}, T^* \cup \{v\}\) has more crossing edges than \(S^*,T^*\). This is a contradiction, since \(S^*,T^*\) is a maximal cut. Thus, our assumption that such a \(v\) exists is false, and \(I_v\) must have at least as many crossing edges as internal edges for all \(v \in V\).
The size of a maximal cut \(S^*,T^*\) of a graph \(G = (V, E)\) is at least \(\frac{1}{2}\abs{E}\), i.e. \(\abs{C(S^*,T^*)} \ge \frac{1}{2}\abs{E}\).
We first observe that
since each edge appears in exactly two sets \(I_v\). Similarly, we have
where \(I_{v,c}\) is the set of crossing edges incident to \(v\). By Claim 117, we have \(\abs{I_{v,c}} \ge \frac{1}{2}\abs{I_v}\), which gives us:
Thus, at least half the edges in \(E\) are in the crossing set \(C(S^*, T^*)\).
We have placed a lower bound on the number of edges in a maximal cut. An obvious upper bound is the full set of edges \(E\); a cut can only contain edges that are in the graph. Thus, we have:
Having established bounds on the number of edges in a maximal cut, we design an algorithm that produces a cut of size at least \(\frac{1}{2}\abs{E}\), giving us a 1/2-approximation of the optimal. We take our inspiration from our argument in Proof 118 – repeatedly find a vertex such that moving it to the opposite partition increases the cut size, until no such vertices exist. The resulting cover will have at least half the edges in \(E\). We refer to this as the local-search algorithm.
The algorithm terminates when there is no vertex such that moving it to the opposite partition increases the cut size. By the reasoning in Proof 118, this means that for every vertex \(v\), at least half the edges in \(I_v\) are edges that cross between \(S\) and \(T\). Thus, \(C(S, T)\) contains at least half the edges in \(E\), and \(S, T\) is a 1/2-approximation.
Since the algorithm makes at least one move in each iteration (except the last) of the outer loop, and each move increases the size of the cut, we can argue by the potential method that it makes at most \(\abs{E}+1\) iterations (a cut can have no more than \(\abs{E}\) edges).
The following illustrates the execution of the local-search algorithm on a sample graph:
Initially, all the vertices are in \(T\), illustrated above as red, solid circles. The algorithm picks the bottom-right vertex to swap, since doing so increases the cut size from zero to three. We represent its membership in \(S\) by depicting the vertex as a blue, open circle. The algorithm then happens to pick the vertex at the middle left – it has three adjacent internal edges and only one adjacent crossing edge, so moving it to \(S\) results in one internal edge and three crossing edges, increasing the cut size to five. At this point, all vertices have at least as many adjacent crossing edges as internal edges, so the algorithm terminates. (The maximal cut for this graph has six edges – can you find it?)
Knapsack
The knapsack problem involves selecting a subset of items that maximizes their total value, subject to a weight constraint. More specifically, we have \(n\) items, and each item \(i\) has a value \(v_i\) and a weight \(w_i\). We have a weight capacity \(C\), and we wish to find a subset \(S\) of the items that maximizes the total value
without exceeding the weight limit:
In this problem formulation, we are not permitted to select the same item more than once; for each item, we either select it to be in \(S\) or we do not. Thus, this formulation is also known as the 0-1 knapsack problem. For simplicity, we assume that each item’s weight is no more than \(C\) – otherwise, we can immediately remove this item from consideration.
A natural dynamic-programming algorithm to compute the maximal set of items does \(O(nC)\) operations, given \(n\) items and a weight limit \(C\). Assuming that the weights and values of each item are numbers that are similar in size to (or smaller than) \(C\), the total input size is \(O(n \log C + \log C) = O(n \log C)\), with all numbers encoded in a non-unary base [8]. Thus, the dynamic-programming solution is not efficient with respect to the input size. In fact, this problem is known to be \(\NP\)-Hard, so we are unlikely to find an efficient solution to solve it exactly.
An algorithm is said to run in pseudopolynomial time if its runtime complexity is polynomial in the value of the integers in the input, as opposed to the size of the integers. Another way of defining pseudopolynomial time is for an algorithm to run in polynomial time with respect to the input when all numbers in the input are encoded in unary. The dynamic-programming algorithm for knapsack takes pseudopolynomial time.
Instead, we attempt to approximate the optimal solution to knapsack with a greedy algorithm. Our first attempt is the relatively greedy algorithm, where we select the items with largest value relative to their weight. In other words, we choose items in order of the ratio \(v_i/w_i\).
This algorithm runs in \(\O(n \log n \log C)\) time, where \(n = \abs{V}\); the time is dominated by sorting, which does \(\O(n \log n)\) comparisons (e.g. using merge sort), and each comparison takes \(\O(\log C)\) time for numbers that are similar in size to \(C\). Thus, this algorithm is efficient with respect to the input size.
Unfortunately, the relatively greedy algorithm is not guaranteed to achieve any approximation ratio \(\alpha\). Consider what happens when we have two items, the first with value \(v_1 = 1\) and weight \(w_1 = 1\), and the second with value \(v_2 = 100\) and weight \(w_2 = 200\). Then item \(1\) has a value-to-weight ratio of \(1\), while item \(2\) has a value-to-weight ratio of \(0.5\). This means that the relatively greedy algorithm will always try to pick item \(1\) before item \(2\). If the weight capacity is \(C = 200\), the algorithm produces \(S = \{1\}\), with a total value of \(value(S) = 1\). However, the optimal solution is \(S^* = \{2\}\), with a total value of \(value(S^*) = 100\). This gives us an approximation ratio of \(0.01\), but we can easily find other examples with arbitrarily smaller ratios.
The main problem in the example above is that the relatively greedy algorithm ignores the most valuable item in favor of one with a better value-to-weight ratio. A dumb-greedy algorithm that just picks the single item with the highest value actually produces the optimal result for the example above. This algorithm is as follows:
This algorithm has the same \(\O(n \log n \log C)\) runtime as the relatively greedy algorithm. While it achieves the optimal result for our previous example, it is straightforward to construct examples where the dumb-greedy algorithm fails to achieve any approximation ratio \(\alpha\). Suppose we have \(201\) items, where all but the last item have value \(1\) and weight \(1\), while the last item has value \(2\) and weight \(200\). In other words, we have
With a weight limit \(C = 200\), the optimal solution is to pick items \(1, \dots, 200\) for a total value of \(200\), but the dumb-greedy algorithm picks the single item \(201\), for a value of \(2\). Thus, it achieves a ratio of just \(0.01\) on this instance of the problem. Moreover, the example can be modified to make the ratio arbitrarily small.
Observe that for this second example, the relatively greedy algorithm actually would produce the optimal solution. It seems like the dumb-greedy algorithm does well where the relatively greedy algorithm fails, and the relatively greedy algorithm performs well when the dumb-greedy algorithm does not. This motivates us to construct a smart-greedy algorithm that runs both the relatively greedy and dumb-greedy algorithm and picks the better of the two solutions. It can be shown that the smart-greedy algorithm 1/2-approximates the optimal solution for any instance of the knapsack problem.
In this exercise, we will prove that the smart-greedy algorithm is a 1/2-approximation algorithm for the 0-1 knapsack problem.
Define the fractional knapsack problem as a variant that allows taking any partial amount of each item. (For example, we can take \(3/7\) of item \(i\), for a value of \((3/7)v_i\), at a weight of \((3/7)w_i\).) Prove that for this fractional variant, the optimal value is no smaller than for the original 0-1 variant (for the same weights, values, and knapsack capacity).
Prove that the sum of the values obtained by the relatively greedy and dumb-greedy algorithm is at least the optimal value for the fractional knapsack problem.
Using the previous parts, prove that the smart-greedy algorithm is a 1/2-approximation algorithm for the 0-1 knapsack problem.
Other Approaches to NP-Hard Problems
While we do not know how to efficiently find exact solutions to \(\NP\)-Hard problems, we can apply heuristics to these problems. There are two general kinds of heuristics:
Those that are guaranteed to get the right answer, but may take a long time on some inputs. However, they may be efficient enough for inputs that we care about. Examples include the dynamic-programming algorithm for 0-1 knapsack where the weight capacity \(C\) is small compared to the number of items \(n\), as well as SAT solvers for Boolean formulas that emerge from software verification.
Those that are guaranteed to run fast, but may produce an incorrect or suboptimal answer. The α-approximation algorithms we considered above are examples of this, as well as greedy algorithms for the traveling salesperson problem, which has no α-approximation algorithm for general instances of the problem.
Some algorithms are only applicable to special cases of a problem. For instance, there are efficient algorithms for finding a maximal cut on planar graphs, which can be depicted on a plane without any crossing edges.
Another general approach is randomized algorithms, which may be able to find good solutions with high probability. We will consider such algorithms in the next unit.