In the previous unit, we considered which computational problems are
solvable by algorithms, without any constraints on the time and memory
used by the algorithm. We saw that even with unlimited resources, many
problems—indeed, “most” problems—are not solvable by any
algorithm.
In this unit on computational complexity, we consider problems that
are solvable by algorithms, and focus on how efficiently they can
be solved. Primarily, we will be concerned how much time it takes to
solve a problem. [1] We mainly focus on decision problems—i.e.,
languages—then later broaden our treatment to functional—i.e.,
search—problems.
A Turing machine’s running time, also known as time complexity, is
the number of steps it takes until it halts; similarly, its space
complexity is the number of tape cells it uses. We focus primarily on
time complexity, and as previously discussed, we are interested in the asymptotic
complexity with respect to the input size, in the worst case. For any
particular asymptotic bound \(\O(t(n))\), we define the set of languages
that are decidable by Turing machines having time complexity
\(\O(t(n))\), where \(n\) is the input size:
Definition 129(DTIME)
Define
\[\DTIME(t(n)) = \set{L \subseteq \Sigma^* : L \text{ is decided
by some Turing machine with time complexity $\O(t(n))$} } \; \text.\]
The set \(\DTIME(t(n))\) is an example of a complexity class: a set of
languages whose complexity in some metric of interest is bounded in
some specified way. Concrete examples include \(\DTIME(n)\), the class
of languages that are decidable by Turing machines that run in (at
most) linear \(O(n)\) time; and \(\DTIME(n^2)\), the class of languages
decidable by quadratic-time Turing machines.
When we discussed computability, we defined two classes of languages:
the decidable languages (also called recursive), denoted by \(\RD\),
and the recognizable languages (also called recursively enumerable),
denoted by \(\RE\). The definitions of these two classes are actually
model-independent: they contain the same languages, regardless of
whether our computational model is Turing machines, lambda calculus,
or some other sensible model, as long as it is Turing-equivalent.
Unfortunately, this kind of model independence does not extend to
\(\DTIME(t(n))\). Consider the concrete language
\[\PALINDROME = \set{x : x = x^R \text{, i.e., $x$ equals its
reversal}} \; \text.\]
In a computational model with random-access memory, or even in the
two-tape Turing-machine model, \(\PALINDROME\) can be decided in linear
\(\O(n)\) time, where \(n = \abs{x}\): just walk pointers inward from both
ends of the input, comparing character by character.
However, in the standard one-tape Turing-machine model, it can be
proved that there is no\(\O(n)\)-time algorithm for \(\PALINDROME\); in
fact, it requires \(\Omega(n^2)\) time to decide. Essentially, the issue
is that an algorithm to decide this language would need to compare the
first and last symbols of \(x\), which requires moving the head
sequentially over the entire string, and do similarly for second and
second-to-last symbols of \(x\), and so on. Because \(n/4\) of the pairs
require moving the head by at least \(n/2\) cells each, this results in
a total running time of \(\Omega(n^2)\). [2] So, \(\PALINDROME \notin
\DTIME(n)\), but in other computational models, \(\PALINDROME\) can be
decided in \(\O(n)\) time.
A model-dependent complexity class is very inconvenient, because we
wish to analyze algorithms at a higher level of abstraction, without
worrying about the details of the underlying computational model, or
how the algorithm would be implemented on it, like the particulars of
data structures (which can affect asymptotic running times).
We wish to define a complexity class that captures all problems that
can be solved “efficiently”, and is also model-independent. As a step
toward this goal, we note that there is an enormous qualitative
difference between the growth of polynomial functions versus
exponential functions. The following illustrates how several
polynomials in \(n\) compare to the exponential function \(2^n\):
The vertical axis in this plot is logarithmic, so that the growth of
\(2^n\) appears as a straight line. Observe that even a polynomial with
a fairly large exponent, like \(n^{10}\), grows much slower than \(2^n\)
(except for small \(n\)), with the latter exceeding the former for all
\(n \geq 60\).
In addition to the dramatic difference in growth between polynomials
and exponentials, we also observe that polynomials have nice closure
properties: if \(f(n),g(n)\) are polynomially bounded, i.e., \(f(n) =
\O(n^c)\) and \(g(n) = \O(n^{c'})\) for some constants \(c,c'\), then
are polynomially bounded as well, because \(\max\set{c,c'}\), \(c+c'\),
and \(c \cdot c'\) are constants. These facts can be used to prove that
if we compose a polynomial-time algorithm that uses some subroutine
with a polynomial-time implementation of that subroutine, then the
resulting full algorithm is also polynomial time. This composition
property allows us to obtain polynomial-time algorithms in a modular
way, by designing and analyzing individual components in isolation.
Thanks to the above properties of polynomials, it has been shown that
the notion of polynomial-time computation is “robust” across many
popular computational models, like the many variants of Turing
machines, lambda calculi, etc. That is, any of these models can
simulate any other one with only polynomial “slowdown”. So, any
particular problem is solvable in polynomial time either in all such
models, or in none of them.
The above considerations lead us to define “efficient” to mean
“polynomial time (in the input size)”. From this we get the following
model-independent complexity class of languages that are decidable in
polynomial time (across many models of computation).
Definition 130(Complexity Class P)
The complexity class \(\P\) is defined as
\[\begin{split}\P
&= \bigcup_{k \geq 1} \DTIME(n^k) \\
& = \set{ L : L \text{ is decided by some polynomial-time TM }}
\; \text.\end{split}\]
In other words, a language \(L \in \P\) if (and only if) it is
decided by some Turing machine that runs in time \(O(n^k)\) for some
constant \(k\).
In brief, \(\P\) is the class of “efficiently decidable” languages.
Based on what we saw in the algorithms unit, this class includes many
fundamental problems of interest, like the decision versions of the
greatest common divisor problem, sorting, the longest increasing
subsequence problem, etc. Note that since \(\P\) is a class of
languages, or decision problems, it technically does not include
the search versions of these problems—but see below for the close
relationship between search and decision.
As a final remark, the extended Church-Turing thesis posits that the
notion of polynomial-time computation is “robust” across all
realistic models of computation:
Theorem 131(Extended Church-Turing Thesis)
A problem is solvable in polynomial time on a Turing machine if and
only if it is solvable in polynomial time in any “realistic”
model of computation.
Because “realistic” is not precisely defined, this is just a thesis,
not a statement than can be proved. Indeed, this is a major
strengthening of the standard Church-Turing thesis, and there is no
strong agreement about whether it is even true! In particular, the
model of quantum computation may pose a serious challenge to this
thesis: it is known that polynomial-time quantum algorithms exist for
certain problems, like factoring integers into their prime divisors,
that we do not know how to solve in polynomial time on Turing
machines! So, if quantum computation is “realistic”, and if there
really is no polynomial-time factoring algorithm in the TM model, then
the extended Church-Turing thesis is false. While both of these
hypotheses seem plausible, they are still uncertain at this time: real
devices that can implement the full model of quantum computation have
not yet been built, and we do not have any proof that factoring
integers (or any other problem that quantum computers can solve in
polynomial time) requires more than polynomial time on a Turing
machine.
For some computational problems, it is possible to efficiently
verify whether a claimed solution is actually correct—regardless
of whether computing a correct solution “from scratch” can be done
efficiently. Examples of this phenomenon are abundant in everyday
life, and include:
Logic and word puzzles like mazes, Sudoku, or crosswords: these
come in various degrees of difficulty to solve, some quite
challenging. But given a proposed solution, it is straightforward to
check whether it satisfies the “rules” of the puzzle. (See below for
a detailed example with mazes.)
Homework problems that ask for correct software code (with
justification) or mathematical proofs: producing a good solution
might require a lot of effort and creativity to discover the right
insights. But given a candidate solution, one can check relatively
easily whether it is clear and correct, simply by applying the rules
of logic. For example, to verify a claimed mathematical proof, we
just need to check whether each step follows logically from the
hypotheses and the previous steps, and that the proof reaches the
desired conclusion.
Music, writing, video, and other media: creating a high-quality
song, book, movie, etc. might require a lot of creativity, effort,
and expertise. But given such a piece of media, it is relatively
easy for even a non-expert to decide whether it is engaging and
worthwhile to them (even though this is a subjective judgment that
may vary from person to person). For example, even though the
authors of this text could not come close to writing a beautiful
symphony or a hit pop song, we easily know one when we hear one.
At first glance, it is not clear whether this maze has a solution.
However, suppose that someone—perhaps a very good solver of mazes,
or the person who constructed the maze—were to claim that this maze
is indeed solvable. Is there some way that they could convince us of
this fact?
A natural idea is simply to provide us with a path through the maze as
“proof” of the claim. It is easy for us to check that this proof is
valid, by verifying that the path goes from start to finish without
crossing any “wall” of the maze. By definition, any solvable maze has
such a path, so it is possible to convince us that a solvable maze
really is solvable.
On the other hand, suppose that a given maze does not have a
solution. Then no matter what claimed “proof” someone might give us,
the checks we perform will cause us to reject it: because the maze is
not solvable, any path must either fail to go from start to finish, or
cross a wall somewhere (or both).
In summary: for any given maze, checking that a claimed solution goes
from start to finish without crossing any wall is an efficient
verification procedure:
The checks can be performed efficiently, i.e., in polynomial time in
the size of the maze.
If the maze is solvable, then there exists a “proof”—namely, a
path through the maze from start to finish—that the procedure will
accept. How to find such a proof is not the verifier’s concern;
all that matters is that it exists. [3]
If the maze is not solvable, then no claimed “proof” will satisfy
the procedure.
Observe that we need both of the last two conditions in order for
the procedure to be considered a correct verifier:
An “overly skeptical” verifier that cannot be “convinced” by
anything, even though the maze is actually solvable, would not be
correct.
Similarly, an “overly credulous” verifier that can be “fooled” into
accepting, even when the maze is not solvable, would also be
incorrect.
As already argued, our verification procedure has the right balance of
“skepticism” and “credulity”.
As another example, the traveling salesperson problem (TSP) is the
task of finding a minimum-weight tour of a given weighted graph. In
this text, a “tour” of a graph is a cycle that visits every vertex
exactly once, i.e., it is a path that visits every vertex and then
immediately returns to its starting vertex. (Beware that some texts
define “tour” differently.) Without loss of generality, for TSP we can
assume that the graph is complete—i.e., there is an edge between
every pair of vertices—by filling in any missing edges with edges of
enormous (or infinite) weight. This does not affect the solution(s),
because any tour that uses any of these new edges will have larger
weight than any tour that does not use any of them.
Suppose we are interested in a minimum-weight tour of the following graph:
If someone were to claim that the path \(A \to B \to D \to C \to A\) is
a minimum-weight tour, could we efficiently verify that this is true?
There doesn’t seem to be an obvious way to do so, apart from
considering all other tours and checking whether any of them have
smaller weight. While this particular graph only has three distinct
tours (ignoring reversals and different starting points on the same
cycle), in general, the number of tours in a graph is exponential in
the number of vertices, so this approach would not be efficient. So,
it is not clear whether we can efficiently verify that a claimed
minimum-weight tour really is one.
However, let’s modify the problem to be a decision problem, which
simply asks whether there is a tour whose total weight is within some
specified “budget”:
Given a weighted graph \(G\) and a budget \(k\), does \(G\) have a tour of
weight at most \(k\)?
Can the existence of such a tour be proved to an efficient, suitably
skeptical verifier? Yes: simply provide such a tour as the proof. The
verifier, given a path in \(G\)—i.e., a list of vertices—that is
claimed to be such a tour, would check that all of the following
hold:
the path starts and ends at the same vertex,
the path visits every vertex in the graph exactly once (except for
the repeated start vertex at the end), and
the sum of the weights of the edges on the path is at most \(k\).
(All of these tests can be performed efficiently; see the
formalization in Algorithm 137 below for
details.)
For example, consider the following claimed “proofs” that the above
graph has a tour that is within budget \(k=60\):
The path \(A \to B \to D \to C\) does not start and end at the same
vertex, so the verifier’s first check would reject it as invalid.
The path \(A \to B \to D \to A\) starts and ends at the same vertex,
but it does not visit vertex \(C\), so the verifier’s second check
would reject it.
The path \(A \to B \to D \to C \to A\) satisfies the first two checks,
but it has total weight \(61\), which is not within the budget, so the
verifier’s third check would reject it.
The path \(A \to B \to C \to D \to A\) satisfies the first two checks,
and its total weight of \(58\) is within the budget, so the verifier
would accept it.
These examples illustrate that when the graph has a tour that is
within the budget, there are still “proofs” that are “unconvincing” to
the verifier—but there will also be a “convincing” proof, which is
what matters to us.
On the other hand, it turns out that the above graph does not have a
tour that is within a budget of \(k=57\). And for this graph and budget,
the verifier will reject no matter what claimed “proof” it is given.
This is because every tour in the graph—i.e., every path that would
pass the verifier’s first two checks—has total weight at least \(58\).
In general, for any given graph and budget, the above-described
verification procedure is correct:
If there is a tour of the graph that is within the budget, then
there exists a “proof”—namely, such a tour itself—that the
procedure will accept. (As before, how to find such a proof is not
the verifier’s concern.)
If there is no such tour, then no claimed “proof” will satisfy the
procedure, i.e., it cannot be “fooled” into accepting.
We now generalize the above examples to formally define the notion of
“efficient verification” for an arbitrary decision problem, i.e., a
language. This definition captures the notion of an efficient
procedure that can be “convinced”, by a suitable “proof”, into
accepting any string in the language, but cannot be “fooled” into
accepting a string that is not in the language.
Definition 132(Efficient Verifier)
An efficient verifier for a language \(L\) is a Turing machine
\(V(x,c)\) that takes two inputs, an instance\(x\) and a
certificate (or “claimed proof”) \(c\), and satisfies the following
properties:
Efficiency: \(V(x,c)\) runs in time polynomial in the size
\(\abs{x}\) of the instance\(x\), for any \(x\) and \(c\).
Completeness: if \(x \in L\), then there exists some \(c\) for
which \(V(x,c)\) accepts.
Soundness: if \(x \notin L\), then for all\(c\), \(V(x,c)\)
rejects.
Alternatively, completeness and soundness together are equivalent
to:
Correctness: \(x \in L\)if and only if there exists some \(c\)
for which \(V(x,c)\) accepts.
We say that a language is efficiently verifiable if there is an
efficient verifier for it.
The claimed equivalence can be seen by taking the contrapositive of
the soundness condition, which is: if there exists some \(c\) for which
\(V(x,c)\) accepts, then \(x \in L\). (Recall that when negating a
predicate, “for all” becomes “there exists”, and vice-versa.) This
contrapositive statement and completeness are, respectively, the “if”
and “only if” parts of correctness.
We sometimes say that a certificate \(c\) is “valid” or “invalid” for a
given instance \(x\) if \(V(x,c)\) accepts or rejects, respectively. Note
that the decision of the verifier is what determines whether a
certificate is “(in)valid”—not the other way around—and that the
validity of a certificate depends on both the instance and the
verifier. There can be many different verifiers for the same
language, which in general can make different decisions on the same
\(x,c\) (though they all must reject when \(x \notin L\)).
In general, the instance \(x\) and certificate \(c\) are arbitrary
strings over the verifier’s input alphabet \(\Sigma\) (and they are
separated on the tape by some special character that is not in
\(\Sigma\), but is in the tape alphabet \(\Gamma\)). However, for specific
languages, \(x\) and \(c\) will typically represent various mathematical
objects like integers, arrays, graphs, vertices, etc. As in the
computability unit, this is done via appropriate encodings of the
objects, denoted by \(\inner{\cdot}\), where without loss of generality
every string decodes as some object of the desired “type” (e.g.,
integer, list of vertices, etc.). Therefore, when we write the
pseudocode for a verifier, we can treat the instance and certificate
as already having the desired types.
We point out some of the important aspects of completeness and
soundness (or together, correctness) in Definition 132.
Notice some similarities and differences between the notions of
verifier and decider (Definition 54):
When the input \(x \in L\), both kinds of machine must accept, but
a verifier need only accept for some value of the certificate
\(c\), and may reject for others. By contrast, a decider does not
get any other input besides \(x\), and simply must accept it.
When the input \(x \notin L\), both kinds of machine must reject,
and moreover, a verifier must reject for all values of \(c\).
There is an asymmetry between the definitions of completeness and
soundness:
If a (sound) verifier for language \(L\)accepts some input \(x,c\),
then we can correctly conclude that \(x \in L\), by the
contrapositive of soundness. (A sound verifier cannot be “fooled”
into accepting an instance that is not in the language.)
However, if a (complete) verifier for \(L\)rejects some input
\(x,c\), then in general we cannot reach any conclusion about
whether \(x \in L\): it might be that \(x \notin L\), or it might be
that \(x \in L\) but \(c\) is just not a valid certificate for \(x\).
(In the latter case, by completeness, some other certificate \(c'\)
is valid for \(x\).)
In summary, while every string \(x \in L\) has a “convincing proof”
of its membership in \(L\), a string \(x \notin L\) does not
necessarily have a proof of its non-membership in \(L\).
We also highlight some important but subtle aspects of the notion of
efficiency from Definition 132.
First, efficiency is defined as polynomial time with respect to the
instance \(x\) alone, not the entire input\(x,c\) together. If we
used the latter, then the verifier would be allowed to run in
exponential time in \(\abs{x}\), simply by giving it an exponentially
long certificate \(c\). This does not fit the spirit of “efficiency”,
and it causes some technical problems as well. [4]
Since a verifier must run in time polynomial in the instance size,
it can read only polynomially many of the initial symbols of \(c\),
and no others. (This is because reading each symbol and moving the
head takes a computational step.) So, the rest of the certificate
is irrelevant, and can be truncated without affecting the
verifier’s behavior. There are two immediate consequences:
Given an efficient verifier, we can assume without loss of
generality that the size of the input certificate is bounded by
some polynomial.
Conversely, when designing a verifier and analyzing its running
time, we can implicitly assume that it reads just enough of the
certificate to make its accept/reject decision. More
specifically, if every member of the language has a certificate
of polynomially bounded size that convinces the verifier, then
we can implicitly assume that the verifier immediately rejects
any certificate that is longer than this bound, without writing
it explicitly in the pseudocode. This does not affect
completeness or soundness, and in the runtime analysis it allows
us to ignore the annoying case where the verifier is given an
enormous certificate.
With the notion of an efficient verifier in hand, analogously to how
we defined \(\P\) as the class of efficiently decidable languages, we
define \(\NP\) as the class of efficiently verifiable languages.
Definition 133(Complexity Class NP)
The complexity class \(\NP\) is defined as the set of efficiently
verifiable languages:
\[\NP = \set{L : L \text{ is efficiently verifiable} } \; \text.\]
In other words, a language \(L \in \NP\) if (and only if) there is an
efficient verifier for it, according to Definition
132. [5]
Let us now formalize our first example of efficient verification from
above: the decision problem of determining whether a given maze has a
solution. A maze can be represented as an undirected graph, with a
vertex for each “intersection” in the maze (along with the start and
end points), and edges between adjacent positions. So, the decision
problem is to determine whether, given such a graph, there exists a
path between the start vertex \(s\) and the end vertex \(t\). This can be
represented as the language [6]
\[\MAZE = \set{(G, s, t) : G \text{ is an undirected graph that has a
path from vertices $s$ to $t$}} \; \text.\]
We define an efficient verifier for \(\MAZE\) as follows; the precise
pseudocode is given in Algorithm 134. An
instance is a graph \(G = (V,E)\), a start vertex \(s\), and a target
vertex \(t\). A certificate is (decoded as) a sequence of up to
\(\abs{V}\) vertices in the graph. (This limit on the number of vertices
in the certificate can be enforced by the decoding, and it simplifies
the efficiency analysis by ruling out huge certificates, as explained
in the Discussion of Efficiency.) The verifier checks that the
sequence describes a valid path in the graph from \(s\) to \(t\), i.e.,
that the first and last vertices in the sequence are \(s\) and \(t\)
(respectively), and that there is an edge between every pair of
adjacent vertices in the sequence.
Algorithm 134(Efficient Verifier for \(\MAZE\))
\begin{algorithmic}
\INPUT instance: a graph and start/end vertices; certificate: a
list of up to $\abs{V}$ vertices
\OUTPUT whether the certificate represents a path in the graph from
the start to the end
\FUNCTION{VerifyMAZE}{$(G=(V,E), s, t), c = (v_1, \ldots v_m)$}
\IF{$v_1 \neq s$ or $v_m \neq t$} \REJECT \ENDIF
\FOR{$i=1$ to $m-1$}
\IF{$(v_i,v_{i+1}) \notin E$} \REJECT \ENDIF
\ENDFOR
\STATE \ACCEPT
\ENDFUNCTION
\end{algorithmic}
Lemma 135
VerifyMAZE is an efficient verifier for \(\MAZE\), so \(\MAZE
\in \NP\).
Proof 136
We show that VerifyMAZE satisfies Definition
132 by showing that it is efficient and
correct.
First, VerifyMAZE runs in time polynomial in the size
\(\abs{G} \geq \abs{V}\) of the graph: it compares two vertices from
the certificate against the start and end vertices in the instance,
and checks whether there is an edge between up to \(\abs{V}-1\) pairs
of adjacent vertices in the certificate. Each pair of vertices can
be checked in polynomial time. The exact polynomial depends on the
representation of \(G\) and the underlying computational model, but
it is polynomial in any reasonable representation (e.g., adjacency
lists, adjacency matrix), which is all that matters for our
purposes here.
As for correctness:
If \((G,s,t) \in \MAZE\), then by definition there is some path \(s
\to v_2 \to \dots \to v_{m-1} \to t\) in \(G\) that visits \(m \leq
\abs{V}\) vertices in total, because any cycle in the path can be
removed. [7] Then by inspection of the pseudocode,
VerifyMAZE\(((G, s, t), c = (s, v_2, \dots, v_{m-1},t))\)
accepts.
Conversely, if VerifyMAZE\(((G,s,t), c = (v_1, \ldots,
v_m))\) accepts for some certificate \(c\), then by inspection of
the pseudocode, \(c\) represents a path between \(v_1=s\) and \(v_m=t\)
in \(G\), so \((G,s,t) \in \MAZE\) by definition.
Alternatively, instead of showing the “conversely” part of correctness
as above, we could have argued that VerifyMAZE is sound, as
follows: if \((G,s,t) \notin \MAZE\), then by definition there is no
path between \(s\) and \(t\) in \(G\), so any certificate \(c\) will either
not start at \(s\), not end at \(t\), or it will have some pair of
adjacent vertices with no edge between them. Thus, by inspection of
the pseudocode, VerifyMAZE\(((G, s, t), c)\) will reject.
This kind of reasoning is correct, but it is more cumbersome and error
prone, since it involves more cases and argues about the
non-existence of certain objects. Usually, and especially for more
complex verifiers, it is easier and more natural to directly prove the
correctness condition (\(x \in L\) if and only if there exists \(c\) such
that \(V(x,c)\) accepts) instead of soundness.
Next, returning to the “limited-budget” TSP example, we define the
corresponding language
\[\TSP = \set{(G, k) : G \text{ is a weighted graph that has
a tour of total weight at most $k$}} \; \text.\]
A certificate for a given instance \((G=(V,E),k)\) is a sequence of up
to \(\abs{V}\) vertices in the graph. The verifier simply checks that
the vertices form a tour whose cost is at most \(k\). The precise
pseudocode is given in Algorithm 137.
Algorithm 137(Efficient Verifier for \(\TSP\))
\begin{algorithmic}
\INPUT instance: weighted graph and weight budget; certificate:
a list of up to $\abs{V}$ vertices in the graph
\OUTPUT whether the certificate represents a tour within the budget
\FUNCTION{VerifyTSP}{$(G=(V,E,w), k), c = (v_0, v_1, \ldots, v_m)$}
\IF{$m \neq \abs{V}$ or $v_0 \neq v_m$} \REJECT \ENDIF
\IF{$v_i = v_j$ for some distinct $1 \leq i,j \leq m$} \REJECT \ENDIF
\STATE $t = 0$
\FOR{$i=0$ to $m-1$}
\STATE $t = t + w(v_i, v_{i+1})$
\ENDFOR
\IF{$t > k$} \REJECT \ENDIF
\STATE \ACCEPT
\ENDFUNCTION
\end{algorithmic}
Lemma 138
VerifyTSP is an efficient verifier for \(\TSP\), so \(\TSP
\in \NP\).
Proof 139
We show that VerifyTSP satisfies Definition
132 by showing that it is efficient and
correct.
First, VerifyTSP runs in time polynomial in the size
\(\abs{G,k}\) of the instance: checking for duplicate vertices \(v_i,
v_j\) can be done in polynomial time, e.g., by checking all \(O(m^2)
= O(\abs{V}^2)\) pairs of distinct \(1 \leq i,j \leq m\). Then the
algorithm loops over \(\abs{V}\) edges, summing their weights. These
weights are included in the input instance, so they can be summed
in polynomial time in the instance size. Finally, the algorithm
compares the sum against \(k\), which can be done in polynomial time.
We now argue correctness:
If \((G, k) \in \TSP\), then by definition, \(G\) has a tour of
weight at most \(k\). Letting \(c\) be the sequence of vertices in
such a tour (starting and ending at the same arbitrary vertex),
we have that VerifyTSP\(((G,k),c)\) accepts, because all of
\(V\)’s checks are satisfied by this \(c\).
Conversely, if VerifyTSP\(((G,k),c)\) accepts for some \(c =
(v_0, \ldots, v_m)\), then because all of \(V\)’s checks are
satisfied, this \(c\) starts and ends at the same vertex
\(v_0=v_{m}\), it visits all \(\abs{V}\) vertices exactly once
(because there are no duplicate vertices among \(v_1, \ldots,
v_m\)), and the total weight of all \(m\) edges between adjacent
vertices in \(c\) is at most \(k\). Therefore, \(c\) is a tour of \(G\)
having total weight at most \(k\), hence \((G,k) \in \TSP\), as
needed.
\(\P\) is the class of languages that can be decided efficiently.
\(\NP\) is the class of languages that can be verified efficiently.
More precisely, a language \(L \in \P\) if there exists a
polynomial-time algorithm \(D\) such that:
if \(x \in L\), then \(D(x)\) accepts;
if \(x \notin L\), then \(D(x)\) rejects.
Similarly, \(L \in \NP\) if there exists a polynomial-time (with respect
to its first input) algorithm \(V\) such that:
if \(x \in L\), then \(V(x, c)\) accepts for at least one certificate
\(c\);
if \(x \notin L\), then \(V(x, c)\) rejects for all certificates \(c\).
How are these two classes related? First, if a language is efficiently
decidable, then it is also efficiently verifiable, trivially: the
verifier can just ignore the certificate, and determine on its own
whether the input is in the language, using the given efficient
decider. (As an exercise, formalize this argument according to the
definitions.) This gives us the following result.
Lemma 140
\(\P \subseteq \NP\).
The above relationship allows for two possibilities:
\(\P \subsetneq \NP\), i.e., \(\P\) is a proper subset of (hence not
equal to) \(\NP\); or
\(\P = \NP\).
The latter possibility would mean that every efficiently verifiable
problem is also efficiently decidable. Is that the case? What is the
answer to the question
\[\large
\text{Does } \P = \NP \text{ ?}\]
We do not know the answer to this question! Indeed, the “\(\P\) versus
\(\NP\)” question is perhaps the greatest open problem in Computer
Science—and even one of the most important problems in all of
Mathematics, as judged by the Clay Mathematics Institute, which has offered a
$1 million prize for its resolution.
Consider the two example languages from above, \(\MAZE\) and \(\TSP\). We
saw that both are in \(\NP\), and indeed, they have similar definitions,
and their verifiers also have much in common. However, we know that
\(\MAZE \in \P\): it can be decided efficiently simply by checking
whether a breadth-first search from vertex \(s\) reaches vertex \(t\). On
the other hand, we do not know whether \(\TSP\) is in \(\P\): we do not
know of any efficient algorithm that decides \(\TSP\), and we do not
know how to prove that no such algorithm exists. Most experts
believe that there is no efficient algorithm for \(\TSP\), which would
imply that \(\P \neq \NP\), but the community has no idea how to prove
this.
How could we hope to resolve the \(\P\)-versus-\(\NP\) question? To show
that the two classes are not equal, as most experts believe, it would
suffice to demonstrate that some single language is in \(\NP\), but is
not in \(\P\). However, this seems exceedingly difficult to do: we would
need to somehow prove that, of all the infinitely many efficient
algorithms, including many we have not yet discovered and cannot even
imagine, none of them decides the language in question. [8] On the
other hand, showing that \(\P = \NP\) also seems very difficult: we
would need to demonstrate that every one of the infinitely many
languages in \(\NP\) can be decided efficiently, i.e., by some
polynomial-time algorithm.
Fortunately, a rich theory has been developed that will make the
resolution of \(\P\) versus \(\NP\) (somewhat) simpler. As we will see
below, it is possible to prove that some languages in \(\NP\) are the
“hardest” ones in that class, in the following sense:
an efficient algorithm for any one of these “hardest” languages
would imply an efficient algorithm for every language in \(\NP\)!
So, to prove that \(\P = \NP\), it would suffice to prove that just
one of these “hardest” languages is in \(\P\). And in the other
direction, the most promising route to prove \(\P \neq \NP\) is to show
that one of these “hardest” languages is not in \(\P\)—because if any
\(\NP\) language is not in \(\P\), then that is also the case for these
“hardest” ones.
In summary, the resolution of the \(\P\)-versus-\(\NP\) question lies
entirely with the common fate of these “hardest” languages:
Any one of them has an efficient algorithm, if and only if all
of them do, if and only if \(\P=\NP\).
Conversely, any one of them does not have an efficient algorithm,
if and only if none of them do, if and only if \(\P \neq \NP\).
It turns out that there are thousands of known “hardest” languages in
\(\NP\). In fact, as we will see, \(\TSP\) is one of them! We will prove
this via a series of results, starting with the historically first
language that was shown to be one of the “hardest” in \(\NP\), next.
Our first example of a “hardest” problem in \(\NP\) is the
satisfiability problem for Boolean formulas. Given as input a
Boolean formula like
\[\phi = (y \vee (\neg x \wedge z) \vee \neg z) \wedge (\neg x
\vee z) \; \text,\]
we wish to determine whether there is a true/false assignment to its
variables that makes the formula evaluate to true. Let us define
the relevant terms:
a (Boolean) variable like \(x\) or \(y\) or \(x_{42}\) can be assigned
the value “true” (often represented as 1) or “false” (represented as
0);
a literal is either a variable or its negation, e.g., \(x\) or \(\neg
y\);
an operator is either conjunction (AND), represented as \(\wedge\);
disjunction (OR), represented as \(\vee\); or negation (NOT),
represented either as \(\neg\) or with an overline, like \(\neg(x
\wedge y)\) or, equivalently, \(\overline{x \wedge y}\).
A Boolean formula is a well-formed mathematical expression involving
literals combined with operators, and following the usual rules of
parentheses for grouping.
An initial observation is that using the rules of logic like De
Morgan’s laws, we
can eliminate any double-negations, and we can iteratively move all
negations “inward” to the literals, e.g., \(\neg (x \wedge \neg y) =
(\neg x \vee y)\). So, from now on we assume without loss of generality
that the negation operator appears only in literals.
We define the size of a formula to be the number of literals it has,
counting duplicate appearances of the same literal. [9] For example,
the formula \(\phi\) above has size 6. Note that the size of a formula
is at least the number of distinct variables that appear in the
formula.
An assignment is a mapping of the variables in a formula to truth
values. We can represent an assignment over \(n\) variables (with some
fixed order) as an \(n\)-tuple, such as \((a_1, \ldots, a_n)\). For the
formula above, the assignment \((0, 1, 0)\) maps \(x\) to the value 0
(false), \(y\) to the value 1 (true), and \(z\) to the value 0 (false).
The notation \(\phi(a_1, \dots, a_n)\) denotes the value of \(\phi\) when
evaluated on the assignment \((a_1, \dots, a_n)\), i.e., with each
variable substituted by its assigned value.
A satisfying assignment for a formula is an assignment that makes
the formula evaluate to true. In the example above, \((0, 1, 0)\) is a
satisfying assignment:
A formula \(\phi\) is satisfiable if it has at least one satisfying
assignment. The decision problem of determining whether a given
formula is satisfiable corresponds to the following language.
Definition 141(Satisfiability Language)
The (Boolean) satisfiability language is defined as
\[\SAT = \set{\phi : \phi \text{ is a satisfiable Boolean formula}} \; \text.\]
A first observation is that \(\SAT\) is decidable. A formula \(\phi\) of
size \(n\) has at most \(n\) variables, so there are at most \(2^n\)
possible assignments. Therefore, we can decide \(\SAT\) using a
brute-force algorithm that simply iterates over all of the assignments
of its input formula, accepting if any of them satisfies the formula,
and rejecting otherwise. Although this is not efficient, it is
apparent that it does decide \(\SAT\).
We also observe that \(\SAT\) has an efficient verifier.
Lemma 142
\(\SAT \in \NP\).
Proof 143
A certificate, or “claimed proof”, for a formula \(\phi\) is just an
assignment for its variables. The following efficient verifier
simply evaluates the given formula on the given assignment and
accepts if the value is true, otherwise it rejects.
\begin{algorithmic}
\INPUT instance: a Boolean formula; certificate: an
assignment for its variables
\OUTPUT whether the assignment satisfies the formula
\FUNCTION{$V_{\SAT{}}$}{$\phi, \alpha$}
\IF{$\phi(\alpha)=1$} \ACCEPT \ENDIF
\STATE \REJECT
\ENDFUNCTION
\end{algorithmic}
This verifier runs in linear time in the size of its input formula
\(\phi\), because evaluating each AND/OR operator in the formula
reduces the number of terms in the expression by one.
For correctness, by the definitions of \(\SAT\) and \(V_{\SAT}\),
so by Definition 132,
\(V_{\SAT}\) is indeed an efficient verifier for \(\SAT\), as claimed.
Is \(\SAT\)efficiently decidable?—i.e., is \(\SAT \in \P\)? The
decider we described above is not efficient: it takes time
exponential in the number of variables in the formula, and the
number of variables may be as large as the size of the formula
(i.e., the number of literals in it), so in the worst case the
algorithm runs in exponential time in its input size.
However, the above does not prove that \(\SAT \notin \P\); it just shows
that one specific (and naïve) algorithm for \(\SAT\) is inefficient.
Conceivably, there could be a more sophisticated efficient algorithm
that cleverly analyzes an arbitrary input formula in some way to
determine whether it is satisfiable. Indeed, there are regular
conferences and competitions to which researchers submit their
best ideas, algorithms, and software for solving \(\SAT\). Although many
algorithms have been developed that perform very impressively on large
\(\SAT\) instances of interest, none of them is believed to run in
polynomial time and to be correct on all instances.
To date, we actually do not know whether \(\SAT\) is efficiently
decidable—we do not know an efficient algorithm for it, and we do
not know how to prove that no such algorithm exists. Yet although the
question of whether \(\SAT \in \P\) is unresolved, the Cook-Levin
Theorem says that \(\SAT\) is a “hardest” problem in \(\NP\). [10]
Theorem 144(Cook-Levin)
If \(\SAT \in \P\), then every \(\NP\) language is in \(\P\), i.e.,
\(\P=\NP\).
The full proof of the Cook-Levin Theorem is ingenious and rather
intricate, but the main idea is fairly easy to describe. Let \(L\) be an
arbitrary language in \(\NP\); this means there is an efficient
verifier\(V\) for \(L\). Using the hypothesis that \(\SAT \in \P\), we
will construct an efficient decider for \(L\), which implies that \(L
\in \P\), as claimed.
The key idea behind the efficient decider for \(L\) is that, using just
the fact that \(V\) is an efficient verifier for \(L\), we can efficiently
transform any instance of \(L\) into an instance of \(\SAT\) that has
the same “yes/no answer”: either both instances are in their
respective languages, or neither is. More precisely, there is an
efficient procedure that maps any instance \(x\) of \(L\) to a
corresponding Boolean formula \(\phi_{V,x}\), such that
\[x \in L \iff \phi_{V,x} \in \SAT \; \text.\]
In fact, the above equivalence follows from a stronger property: by
the careful design of \(\phi_{V,x}\), there is a two-way correspondence
between the valid certificate(s) \(c\) for \(x\) (if any), and the
satisfying assignment(s) for \(\phi_{V,x}\) (if any). That is, any
certificate \(c\) that makes \(V(x,c)\) accept directly yields a
corresponding satisfying assignment for \(\phi_{V,x}\), and any
satisfying assignment for \(\phi_{V,x}\) yields a corresponding
certificate \(c\) that makes \(V(x,c)\) accept.
By the hypothesis that \(\SAT \in \P\), there is an efficient decider
\(D_{\SAT}\) for \(\SAT\), so from all this we get the following efficient
decider for \(L\):
\begin{algorithmic}
\FUNCTION{$D_L$}{$x$}
\STATE \CONSTRUCT $\phi_{V,x}$ as described below
\STATE \RETURN \CALL{$D_{\SAT{}}$}{$\phi_{V,x}$}
\ENDFUNCTION
\end{algorithmic}
Since \(\phi_{V,x}\) can be constructed in time polynomial in the size
of \(x\), and \(D_\SAT\) runs in time polynomial in the size of
\(\phi_{V,x}\), by composition, \(D_L\) as a whole runs in time polynomial
in the size of \(x\). And the fact that \(D_L\) is correct (i.e., it
decides \(L\)) follows directly from the correctness of \(D_{\SAT}\) and
the above-stated relationship between \(x\) and \(\phi_{V,x}\).
This completes the high-level description of the proof strategy. All
that remains is to show how to efficiently construct \(\phi_{V,x}\) from
\(x\) so that they satisfy the above-stated relationship, which we do in
what follows.
In order to describe the construction of \(\phi_{V,x}\), we fist need to
recall the notion of a configuration of a Turing machine and its
representation as a sequence, which we previously discussed in
Wang Tiling. A configuration encodes a “snapshot”
of a Turing machine’s execution: the contents of the machine’s tape,
the active state \(q \in Q\), and the position of the head. We represent
these as an infinite sequence over the alphabet \(\Gamma \cup Q\) (the
union of the finite tape alphabet and the TM’s finite set of states),
which is simply:
the contents of the tape, in order from the leftmost cell,
with the active state \(q \in Q\) inserted directly to the left of the
symbol corresponding to the cell on which the head is positioned.
For example, if the input is \(x_1 x_2 \cdots x_n\) and the initial
state is \(q_0\), then the following sequence represents the machine’s
starting configuration:
\[q_0 x_1 x_2 \cdots x_n \bot \bot \ldots\]
Since the head is at the leftmost cell and the state is \(q_0\), the
string has \(q_0\) inserted to the left of the leftmost tape symbol. If
the transition function gives \(\delta(q_0, x_1) = (q', x', R)\), then
the next configuration is represented by
\[x' q' x_2 \cdots x_n \bot \bot \ldots\]
The first cell has been modified to \(x'\), the machine is in state
\(q'\), and the head is at the second cell, represented here by writing
\(q'\) to the left of that cell’s symbol.
As stated above in the proof overview, for any language \(L \in \NP\)
there exists an efficient verifier \(V\) that runs in time polynomial in
the size of its first argument. In other words, there is some constant
\(k\) such that \(V(x, c)\) runs for at most \(\abs{x}^k\) steps before
halting, for any \(x,c\). [11] Since the head starts at the leftmost
cell and can move only one position in each step, this implies that
\(V(x, c)\) can read or write only the first \(\abs{x}^k\) cells of the
tape. Thus, instead of using an infinite sequence, we can represent
any configuration during the execution of \(V(x,c)\) using just a
finite string of length about \(\abs{x}^k\), ignoring the rest since
it cannot affect the execution.
For an instance \(x\) of size \(n=\abs{x}\), we can represent the sequence
of all the configurations of the machine, over its (at most) \(n^k\)
computational steps, as an \(n^k\)-by-\(n^k\)tableau, with one
configuration per row; for convenience later on, we also place a
special \(\#\) symbol at the start and end of each row. [12] So, each
cell of the tableau contains a symbol from the set \(S = \Gamma \cup Q
\cup \set{\#}\), where \(\Gamma\) is the finite tape alphabet of \(V\); \(Q\)
is the finite set of states in \(V\); and \(\# \notin \Gamma \cup Q\) is a
special extra symbol.
Observe that, since \(V\) is deterministic, the contents of the first
row of the tableau completely determine the contents of all the
rows, via the “code” of \(V\). Moreover, adjacent rows represent a
single computational step of the machine, and hence are identical to
each other, except in the vicinity of the symbols representing the
active states before and after the transition. Finally, \(V(x,c)\)
accepts if and only if the accept-state symbol \(\qacc \in Q\) appears
somewhere in its tableau. These are important feature of tableaus that
are exploited in the construction of the formula \(\phi_{V,x}\), which
we describe next.
With the notion of a computational tableau in hand, we now describe
the structure of the formula \(\phi_{V,x}\) that is constructed from the
instance \(x\) (also using the fixed code of \(V\)).
The variables of the formula represent the contents of all the
cells in a potential tableau for \(V\). That is, assigning Boolean
values to all the variables fully specifies the contents of a
claimed tableau, including the value of a certificate \(c\) in the
first row.
Conceptually, we can think of an assignment to the variables, and
the potential tableau that they represent, as a claimed execution
transcript for \(V\).
The formula \(\phi_{V,x}\) is carefully designed to evaluate whether
the claimed tableau (as specified by the variables’ values) meets
two conditions:
it is the actual execution tableau of \(V(x,c)\), for the
specific given instance \(x\), and whatever certificate \(c\)
appears in the first row, and
it is an accepting tableau, i.e., \(V(x,c)\) accepts.
Conceptually, we can think of \(\phi_{V,x}\) as checking whether the
claimed execution transcript for \(V\) is genuine, and results in \(V\)
accepting the specific instance \(x\).
In summary, the formula \(\phi_{V,x}\) evaluates to true if and only
if its variables are set so that they specify the full and accepting
execution tableau of \(V(x,c)\), for the certificate \(c\) specified by
the variables. Therefore, as needed,
\[\begin{split}\phi_{V,x} \in \SAT
&\iff \phi_{V,x} \text{ is satisfiable} \\
&\iff \text{there exists $c$ s.t. } V(x,c) \accepts \\
&\iff x \in L \; \text,\end{split}\]
We now give the details. The variables of the formula are as follows:
For each cell position \(i,j\) of the tableau, and each symbol \(s \in
S\), there is a Boolean variable \(t_{i,j,s}\).
So, there are \(\abs{S} \cdot n^{2k} = O(n^{2k})\) variables in total,
which is polynomial in \(n = \abs{x}\) because \(\abs{S}\) and \(2k\) are
constants. (Recall that \(S\) is a fixed finite alphabet that does not
vary with \(n\).)
Assigning Boolean values to these variables specifies the contents of
a claimed tableau, as follows:
Assigning \(t_{i,j,s} = \text{true}\) specifies that cell \(i,j\) of the
claimed tableau has symbol \(s\) in it.
Observe that the variables can be assigned arbitrary Boolean values,
so for the same cell location \(i,j\), we could potentially assign
\(t_{i,j,s} = \text{true}\) for multiple different values of \(s\), or
none at all. This would make the contents of cell \(i,j\) undefined. The
formula \(\phi_{V,x}\) is designed to “check” for this and evaluate to
false in such a case, and also to check for all the other needed
properties on the claimed tableau, as we now describe.
The formula \(\phi_{V,x}\) is defined as the conjunction (AND) of four
subformulas:
Each subformula “checks” (or “enforces”) a certain condition on the
variables and the claimed tableau they specify, evaluating to true if
the condition holds, and false if it does not. The subformulas check
the following conditions:
\(\phi_{\text{cell}}\) checks that each cell of the claimed
tableau is well defined, i.e., it contains exactly one symbol;
\(\phi_{\text{accept}}\) checks that the claimed tableau is an
accepting tableau, i.e., that \(\qacc\) appears somewhere in it;
\(\phi_{\text{start},x}\) checks that the first row of the claimed
tableau is valid, i.e., it represents the initial configuration of
the verifier running on the given instance\(x\) and some
certificate \(c\) (as specified by the variables);
\(\phi_{\text{move},V}\) checks that each non-starting row of the
claimed tableau follows from the previous one, according to the
transition function of \(V\).
Observe that, as needed above, all these conditions hold if and only
if the claimed tableau is indeed the actual, accepting execution
tableau of \(V(x,c)\), for the certificate \(c\) that appears in the first
row. So, it just remains to show how to design the subformulas to
correctly check their respective conditions, which we do next.
To check that a given cell \(i,j\) has a well-defined symbol, we need
exactly one of the variables \(t_{i,j,s}\) to be true, over all \(s \in
S\). The formula
\[\large
\bigvee_{s \in S} t_{i,j,s}\]
checks that at least one of the variables is true, and the formula
checks that for all \(i,j\), exactly one of \(t_{i,j,s}\) is true, as
needed.
The formula has \(\O(\abs{S}^2) = O(1)\) literals for each cell (because
\(S\) is a fixed alphabet that does not vary with \(n=\abs{x}\)), so
\(\phi_{\text{cell}}\) has size \(\O(n^{2k})\), which is polynomial in
\(n=\abs{x}\) because \(2k\) is a constant.
To check that the claimed tableau is an accepting one, we just need to
check that at least one cell has \(\qacc\) as its symbol, which is done
by the following subformula:
For the top row of the tableau, which represents the starting
configuration, we suppose that the encoding of an input pair \((x, c)\)
separates \(x\) from \(c\) on the tape using a special symbol \(\$
\in \Gamma \setminus \Sigma\) that is in the tape alphabet \(\Gamma\) but
not in the input alphabet \(\Sigma\). (Recall that \(x\) and \(c\) are
strings over \(\Sigma\), so \(\$\) unambiguously separates them.)
Letting \(m=\abs{c}\), the top row of the tableau is therefore as
follows:
The row starts with the \(\#\) symbol. Since the machine is in active
state \(\qst\), and the head is at the leftmost cell of the tape, \(\qst\)
is the second symbol. The contents of the tape follow, which are the
symbols of the input \(x\), then the \(\$\) separator, then the
symbols of the certificate \(c\), then blanks until we have \(n^k\)
symbols, except for the last symbol, which is \(\#\).
We now describe the subformula \(\phi_{\text{start},x}\) that checks
that the first row of the claimed tableau has the above form. We
require the row to have the specific, given instance \(x\) in its
proper position of the starting configuration. This is checked by the
formula
For the tableau cells corresponding to the certificate, our
construction does not know what certificate(s) \(c\), if any, would
cause \(V(x,c)\) to accept, or even what their sizes are (other than
that they are less than \(n^k\)). Indeed, a main idea of this
construction is that any satisfying assignment for \(\phi_{V,x}\), if
one exists, specifies a valid certificate\(c\) for \(x\) (and vice
versa).
So, our formula allows any symbol \(s \in \Sigma \cup \set{\bot}\)
from the input alphabet, as well as blank, to appear in the positions
after the separator \(\$\). We just need to ensure that any blanks
appear only after the certificate string, i.e., not before any
symbol from \(\Sigma\). Overall, the formula that checks that the
portion of the first row dedicated to the certificate is well-formed
is
In words, this says that each cell in the zone dedicated to the
certificate either has a blank, or has an input-alphabet symbol and
the preceding symbol is not blank.
Putting these pieces together with the few other fixed cell values,
the subformula that checks the first row of the claimed tableau is:
This subformula has one literal for each symbol in the instance \(x\),
the start state, and the special \(\#\) and \(\$\) symbols. It has
\(O(\abs{\Sigma}) = O(1)\) literals for each cell in the zone dedicated
to the certificate (recall that the tape alphabet \(\Gamma\) is fixed
and does not vary with \(n=\abs{x}\)). Thus, this subformula has size
\(\O(n^k)\), which again is polynomial in \(n=\abs{x}\).
Finally, we describe the subformula that checks whether every
non-starting row in the claimed tableau follows from the previous one.
This is the most intricate part of the proof, and the only one that
relies on the transition function, or “code”, of \(V\).
As a warmup first attempt, recall that any syntactically legal
configuration string \(r\) (regardless of whether \(V\) can actually reach
that configuration) determines the next configuration string \(r'\),
according to the code of \(V\). So, letting \(i,i+1\) denote a pair of
adjacent row indices, and \(r\) denote a legal configuration string, we
could write:
the formula that checks whether row \(i\) equals \(r\)and row \(i+1\)
equals the corresponding \(r'\), namely,
Because \(\phi_{\text{start},x}\) checks that the first row of the
claimed tableau is a valid starting configuration,
\(\phi_{\text{start},x} \wedge \phi_{\text{move},V}\) does indeed check
that the entire claimed tableau is valid.
This approach gives a correct formula, but unfortunately, it is not
efficient—the formula does not have size polynomial in
\(n=\abs{x}\). The only problem is in the second step: because there are
multiple valid choices for each symbol of a legal \(r\), and \(r\) has
length \(n^k\), there are exponentially many such \(r\). So, taking the
OR over all of them results in a formula of exponential size.
To overcome this efficiency problem, we proceed similarly as above,
but using a finer-grained breakdown of adjacent rows than in
\(\phi_{i,r}\) above. The main idea is to consider the 2-by-3
“windows” (subrectangles) of adjacent rows, and check that every
window is “valid” according to the code of \(V\). It can be proved that
if all the overlapping 2-by-3 windows of a claimed pair of rows are
valid, then the pair as a whole is valid. This is essentially
because valid adjacent rows are almost identical (except around the
cells with the active states), and the windows overlap sufficiently to
guarantee that any invalidity in the claimed rows will be exposed in
some 2-by-3 window. [13]
There are \(\abs{S}^6 = O(1)\) distinct possibilities for a 2-by-3
window, but not all of them are valid. Below we describe all the valid
windows, and based on this we construct the subformula
\(\phi_{\text{move},V}\) as follows.
For any particular valid window \(w\), let \(s_1, \ldots, s_6\) be its six
symbols, going left-to-right across its first then second row.
Similarly to \(\phi_{i,r}\) above, the following simple formula checks
whether the window of the claimed tableau with top-left corner at
\(i,j\) matches \(w\):
Now, letting \(W\) be the set of all valid windows, similarly to
\(\phi_i\) above, the following formula checks whether the window at
\(i,j\) of the claimed tableau is valid:
The set \(W\) of valid windows has size \(\abs{W} \leq \abs{S}^6 = O(1)\),
so each \(\phi_{i,j}\) has \(O(1)\) literals. Because there are
\(\O(n^{2k})\) windows to check, the subformula \(\phi_{\text{move},V}\)
has size \(\O(n^{2k})\), which again is polynomial in \(n=\abs{x}\).
We now describe all the valid windows. The precise details of this are
not so essential, because all we used above were the facts that the
valid windows are well defined, and that there are \(O(1)\) of them. We
use the following notation in the descriptions and diagrams:
\(\gamma\) for an arbitrary element of \(\Gamma\),
\(q\) for an arbitrary element of \(Q\),
\(\rho\) for an arbitrary element of \(\Gamma \cup Q\),
\(\sigma\) for an arbitrary element of \(\Gamma \cup \set{\#}\).
First, a window having the \(\#\) symbol somewhere in it is valid only
if \(\#\) is in the first position of both rows, or is in the third
position of both rows. So, every valid window has one of the following
forms, where the symbols \(\rho_1, \dots, \rho_6\) must additionally be
valid according to the rules below.
If a window is not in the vicinity of a state symbol \(q \in Q\), then
the machine’s transition does not affect the portion of the
configuration corresponding to the window, so the top and bottom rows
match:
To reason about what happens in a window that is in the vicinity of a
state symbol, we consider how the configuration changes as a result of
a right-moving transition \(\delta(q, \gamma) = (q', \gamma', R)\):
The head moves to the right, so \(q'\) appears in the bottom row at the
column to the right of where \(q\) appears in the top row. The symbol
\(\gamma\) is replaced by \(\gamma'\), though its position moves to the
left to compensate for the rightward movement of the state symbol.
There are four windows affected by the transition, labeled above by
their leftmost columns. We thus have the following four kinds of valid
windows corresponding to a rightward transition \(\delta(q, \gamma) =
(q', \gamma', R)\):
Note that we have used \(\sigma\) for the edges of some windows, as
there can be either a tape symbol or the tableau-edge marker \(\#\) in
these positions.
We now consider how the configuration changes as a result of a
left-moving transition \(\delta(q, \gamma) = (q', \gamma', L)\):
The head moves to the left, so \(q'\) appears in the bottom row at the
column to the left of where \(q\) appears in the top row. As with a
rightward transition, we replace the old symbol \(\gamma\) with the new
symbol \(\gamma'\). This time, however, the symbol to the left of the
original position of the head moves right to compensate for the
leftward movement of the head. So here we have the following five
kinds of valid windows corresponding to the leftward transition
\(\delta(q, \gamma) = (q', \gamma', R)\):
We also need to account for the case where the head is at the leftmost
cell on the tape, in which case the head does not move:
Here we have three kinds of valid windows, but the last one actually
has the same structure as the last window in the normal case for a
leftward transition. Thus, we have only two more kinds of valid
windows:
Finally, we need one last set of valid windows to account for the
machine reaching the accept state before the last row of the tableau.
As stated above, the valid tableau is “padded” with copies of the
final configuration, so the valid windows for this case look like:
We summarize the main claims about what we have just done.
Using an efficient verifier \(V\) for an arbitrary language \(L \in \NP\),
we have demonstrated how to efficiently construct a corresponding
formula \(\phi_{V,x}\) from a given instance \(x\) of \(L\). The formula can
be constructed in time polynomial in \(\abs{x}\), and in particular the
total size of the formula is \(\O(\abs{x}^{2k})\) literals.
Then, we (almost entirely) proved that \(x \in L \iff \phi_{V,x} \in
\SAT\). In the \(\implies\) direction: by correctness of \(V\), any \(x \in
L\) has at least one certificate \(c\) that makes \(V(x,c)\) accept, so the
computation tableau for \(V(x,c)\) is accepting. In turn, by design of
the formula \(\phi_{V,x}\), this tableau defines a satisfying assignment
for the formula (i.e., an assignment to its variables that makes
\(\phi_{V,x}\) evaluate to true). So, \(x \in L\) implies that \(\phi_{V,x}
\in \SAT\).
In the \(\impliedby\) direction: if the constructed formula \(\phi_{V,x}
\in \SAT\), then by definition it has a satisfying assignment. Again by
the special design of the formula, any satisfying assignment defines
the contents of an actual, accepting computation tableau for \(V(x,c)\),
for the \(c\) appearing in the top row of the tableau. Since \(V(x,c)\)
accepts for this \(c\), we conclude that \(x \in L\) by the correctness of
the verifier. So, \(\phi_{V,x} \in \SAT\) implies that \(x \in L\).
Finally, if \(\SAT \in \P\), then an efficient decider \(D_\SAT\) for
\(\SAT\) exists, and we can use it to decide \(L\) efficiently: given an
instance \(x\), simply construct \(\phi_{V,x}\) and output
\(D_\SAT(\phi_{V,x})\); by the above property, this correctly determines
whether \(x \in L\). As a result, \(\SAT \in \P\) implies that \(\NP
\subseteq \P\) and hence \(\P = \NP\), as claimed.
With the example of \(\SAT\) and the Cook-Levin theorem in hand, we now
formally define what it means to be a “hardest” problem in \(\NP\), and
demonstrate several examples of such problems.
The heart of the Cook-Levin theorem is an efficient algorithm for
converting an instance of an (arbitrary) language \(L \in \NP\) to an
instance \(\phi_{V,x}\) of \(\SAT\), such that
\[x \in L \iff \phi_{V,x} \in \SAT \; \text.\]
Thus, any (hypothetical) efficient decider for \(\SAT\) yields an
efficient decider for \(L\), which simply converts its input \(x\) to
\(\phi_{V,x}\) and invokes the decider for \(\SAT\) on it.
This kind of efficient transformation, from an arbitrary instance of
one problem to a corresponding instance of another problem having the
same “yes/no answer”, is known as a polynomial-time mapping
reduction, which we now formally define.
Definition 145(Polynomial-Time Mapping Reduction)
A polynomial-time mapping reduction—also known as a Karp
reduction for short—from a language \(A\) to a language \(B\) is a
function \(f \colon \Sigma^* \to \Sigma^*\) having the following
properties: [14]
Efficiency: \(f\) is polynomial-time computable: there is an
algorithm that, given any input \(x \in \Sigma^*\), outputs \(f(x)\)
in time polynomial in \(\abs{x}\).
Correctness: for all \(x \in \Sigma^*\),
\[x \in A \iff f(x) \in B \; \text.\]
Equivalently, if \(x \in A\) then \(f(x) \in B\), and if \(x \notin
A\) then \(f(x) \notin B\). [15]
If such a reduction exists, we say that \(A\)polynomial-time
mapping-reduces (or Karp-reduces) to \(B\), and write \(A \leq_p B\).
We first observe that a polynomial-time mapping reduction is
essentially a special case of (or more precisely, implies the
existence of) a Turing reduction: we can
decide \(A\) by converting the input instance of \(A\) to an instance of
\(B\) using the efficient transformation function, then querying an
oracle that decides \(B\). The following makes this precise.
Lemma 146
If \(A \leq_p B\), then \(A \leq_T B\).
Proof 147
Because \(A \leq_p B\), there is an efficiently computable function
\(f\) such that \(x \in A \iff f(x) \in B\). To show that \(A \leq_T B\),
we give a Turing machine \(M_A\) that decides \(A\) using an oracle
\(M_B\) that decides \(B\):
Clearly, \(M_A\) halts on any input, because \(f\) is polynomial-time
computable and \(M_B\) halts on any input. And by the correctness of
\(M_B\) and the code of \(M_A\),
\[\begin{split}x \in A
&\iff y=f(x) \in B \\
&\iff M_B(y) \accepts \\
&\iff M_A(x) \accepts .\end{split}\]
Observe that the Turing reduction given in Proof 147 simply:
applies an efficient transformation to its input,
invokes its oracle once, on the resulting value, and
immediately outputs the oracle’s answer.
In fact, several (but not all!) of the Turing reductions we have seen
for proving undecidability (e.g., for the halts-on-empty
language and other undecidable languages) are effectively polynomial-time mapping
reductions, because they have this exact form. (To be precise, the
efficient transformation in such a Turing reduction is the mapping
reduction, and its use of its oracle is just fixed “boilerplate”.)
Although a polynomial-time mapping reduction is effectively a Turing
reduction, the reverse does not hold in general, due to some important
differences:
A Turing reduction has no efficiency constraints, apart from the
requirement that it halts: the reduction may run for arbitrary
finite time, both before and after its oracle call(s).
By contrast, in a polynomial-time mapping reduction, the output of
the conversion function must be computable in time polynomial in the
input size.
A Turing reduction is a Turing machine that decides one language
given an oracle (“black box”) that decides another language, which
it may use arbitrarily. In particular, it may invoke the oracle
multiple times (or not at all), and perform arbitrary
“post-processing” on the results, e.g., negate the oracle’s answer.
By contrast, a polynomial-time mapping reduction does not involve
any explicit oracle; it is merely a conversion function that must
“preserve the yes/no answer”. Therefore, there is no way for it to
“make multiple oracle calls,” nor for it to “post-process” (e.g.,
negate) the yes/no answer for its constructed instance. Implicitly,
a polynomial-time mapping reduction corresponds to making just one
oracle call and then immediately outputting the oracle’s answer.
[16]
In Lemma 91 we saw that if \(A
\leq_T B\) and \(B\) decidable, then \(A\) is also decidable. Denoting the
class of decidable languages by \(\RD\), this result can be restated as:
If \(A \leq_T B\) and \(B \in \RD\), then \(A \in \RD\).
Polynomial-time mapping reductions give us an analogous result
with respect to membership in the class \(\P\).
Lemma 148
If \(A \leq_p B\) and \(B \in \P\), then \(A \in \P\).
Proof 149
Because \(B \in \P\), there is a polynomial-time Turing machine \(M_B\)
that decides \(B\). And because \(A \leq_p B\), the Turing machine
\(M_A\) defined in Proof 147,
with the machine \(M_B\) as its “oracle”, decides \(A\).
In addition, \(M_A(x)\) runs in time polynomial in its input size
\(\abs{x}\), by composition of polynomial-time algorithms.
Specifically, by the hypothesis \(A \leq_p B\), computing \(f(x)\) runs
in time polynomial in \(\abs{x}\), and as already noted, \(M_B\) runs
in time polynomial in its input length, so the composed algorithm
\(M_A\) runs in polynomial time.
Since \(M_A\) is a polynomial-time Turing machine that decides \(A\),
we conclude that \(A \in \P\), as claimed.
Analogously to how Lemma 93
immediately follows from Lemma 91,
the following corollary is merely the contrapositive of Lemma
148.
Lemma 150
If \(A \leq_p B\) and \(A \notin \P\), then \(B \notin \P\).
With the notion of a polynomial-time mapping (or Karp) reduction in
hand, the heart of the Cook-Levin theorem can restated as saying that
\(A \leq_p \SAT\) for every language \(A \in \NP\).
Combining this with Lemma 148, as an immediate
corollary we get the statement of Theorem 144: if \(\SAT \in \P\), then every\(\NP\) language is in
\(\P\), i.e., \(\P=\NP\).
Informally, the above says that \(\SAT\) is “at least as hard as” every
language in \(\NP\) (under Karp reductions). This is a very important
property that turns out to be shared by many languages, so we define a
special name for it.
Definition 151(NP-Hard)
A language \(L\) is \(\NP\)-hard if \(A \leq_p L\) for every language
\(A \in \NP\).
We define \(\NPH = \set{L : L \text{ is $\NP$-hard}}\) to be the
class of all such languages.
We stress that a language need not be in \(\NP\), or even be decidable,
to be \(\NP\)-hard. For example, it can be shown that the undecidable
language \(\atm\) is \(\NP\)-hard (see Exercise 160).
With the notion of \(\NP\)-hardness in hand, the core of the Cook-Levin
theorem is as follows.
Theorem 152(Cook-Levin core)
\(\SAT\) is \(\NP\)-hard.
Previously, we mentioned the existence of languages that are,
informally, the “hardest” ones in \(\NP\), and that \(\SAT\) was one of
them. We now formally define this notion.
Definition 153(NP-Complete)
A language \(L\) is \(\NP\)-complete if:
\(L \in \NP\), and
\(L\) is \(\NP\)-hard.
We define \(\NPC = \set{L : L \text{ is $\NP$-complete}}\) to be the
class of all such languages.
Since \(\SAT \in \NP\) (by Lemma 142) and
\(\SAT\) is \(\NP\)-hard (by Cook-Levin), \(\SAT\) is indeed \(\NP\)-complete.
To show that some language \(L\) of interest is \(\NP\)-hard, do we need to
repeat and adapt all the work of the Cook-Levin theorem, with \(L\) in
place of \(\SAT\)? Thankfully, we do not! Analogously to Lemma
93—which lets us prove undecidability
by giving a Turing reduction from a known-undecidable language—the
following lemma shows that we can establish \(\NP\)-hardness by giving
a Karp reduction from a known-\(\NP\)-hard language. (Below we do this
for several concrete problems of interest, in More
NP-complete Problems.)
Lemma 154
If \(A \leq_p B\) and \(A\) is \(\NP\)-hard, then \(B\) is \(\NP\)-hard.
Proof 155
This follows from the fact that \(\leq_p\) is a transitive
relation; see Exercise 158 below.
By the two hypotheses and Definition 151,
\(L \leq_p A\) for every \(L \in \NP\), and \(A \leq_p B\), so \(L \leq_p
B\) by transitivity. Since this holds for every \(L \in \NP\), by
definition we have that \(B\) is \(\NP\)-hard, as claimed.
Combining Lemma 148, Lemma 150, and Lemma 154, the fact
that \(A \leq_p B\) has the following implications in various scenarios.
The concepts of \(\NP\)-hardness and \(\NP\)-completeness are powerful tools
for making sense of, and potentially resolving, the \(\P\)-versus-\(\NP\)
question. As the following theorem shows, the two possibilities \(\P =
\NP\) and \(\P \neq \NP\) each come with a variety of equivalent
“syntactically weaker” conditions. So, resolving \(\P\) versus \(\NP\)
comes down to establishing any one of these conditions (which is
still a very challenging task!). In addition, the theorem establishes
strict relationships between the various classes of problems we have
defined, under each of the two possibilities \(\P = \NP\) and \(\P \neq
\NP\).
Theorem 156(\(\P\) versus \(\NP\))
The following statements are equivalent, i.e., if any one of them
holds, then all of them hold.
Some\(\NP\)-hard language is in \(\P\).
(In set notation: \(\NPH \cap \P \neq \emptyset\).)
Every\(\NP\)-complete language is in \(\P\).
(In set notation: \(\NPC \subseteq \P\).)
\(\P = \NP\).
(In words: every language in \(\NP\) is also in \(\P\), and
vice-versa.)
Every language in \(\NP\), except the “trivial” languages
\(\Sigma^*\) and \(\emptyset\), is \(\NP\)-hard (and hence
\(\NP\)-complete). [17]
(In set notation: \(\NP \setminus \set{\Sigma^*, \emptyset}
\subseteq \NPH\).)
It follows that \(\P \neq \NP\) if and only if no\(\NP\)-hard language
is in \(\P\) (i.e., \(\NPH \cap \P = \emptyset\)), which holds if and
only if some nontrivial language in \(\NP\) is not\(\NP\)-hard.
Before giving the proof of Theorem 156, we discuss its main consequences for attacking
the \(\P\)-versus-\(\NP\) question. First, by the equivalence of
Statements 1 and 2, all \(\NP\)-complete problems have the same
“status” : either all of them are efficiently solvable—in which
case \(\P=\NP\), by the equivalence with Statement 3—or none of them
is, in which case \(\P \neq \NP\).
So, to prove that \(\P=\NP\), it would be sufficient (and necessary) to
have an efficient algorithm for any single\(\NP\)-complete (or even
just \(\NP\)-hard) problem. This is by the equivalence of Statements 1
and 3.
To prove that \(\P \neq \NP\), it would trivially suffice to prove that
any single problem in \(\NP\) is not in \(\P\). For this we might as
well focus our efforts on showing this for some \(\NP\)-complete
problem, because by the equivalence of Statements 3 and 1, some\(\NP\) problem lacks an efficient algorithm if and only if every\(\NP\)-complete problem does.
Alternatively, to prove that \(\P \neq \NP\), by the equivalence of
Statements 3 and 4 it would suffice to show that some nontrivial
language in \(\NP\) is not\(\NP\)-hard. For this we might as well focus
our efforts on showing this for some language in \(\P\), because by the
equivalence of Statements 4 and 1, if any nontrivial language in
\(\NP\) is not \(\NP\)-hard, then every nontrivial language in \(\P\) is not
\(\NP\)-hard.
Based on Theorem 156, the
following Venn diagram shows the necessary and sufficient
relationships between the classes \(\P\), \(\NP\), \(\NP\)-hard (\(\NPH\)), and
\(\NP\)-complete (\(\NPC\)), for each of the two possibilities \(\P \neq \NP\)
and \(\P = \NP\).
Proof 157
We refer to Statement 1 as “S1”, and similarly for the others. The
following implications hold immediately by the definitions (and set
operations):
\(\text{S2} \implies \text{S1}\) because an \(\NP\)-complete, and hence
\(\NP\)-hard, language exists (e.g., \(\SAT\)).
(In set notation: if \(\NPH \cap \NP = \NPC \subseteq \P\), then by
intersecting with \(\P\), we get that \(\NPH \cap \P = \NPC \neq
\emptyset\).)
\(\text{S3} \implies \text{S2}\) because by definition, every
\(\NP\)-complete language is in \(\NP\).
(In set notation: if \(\P = \NP\), then \(\NPC \subseteq \NP
\subseteq \P\).)
\(\text{S4} \implies \text{S1}\) because there is a nontrivial
language \(L \in \P \subseteq \NP\) (e.g., \(L=\MAZE\)), and by S4,
\(L\) is \(\NP\)-complete and hence \(\NP\)-hard.
(In set notation: \(\emptyset \neq \P \setminus \set{\Sigma^*,
\emptyset} \subseteq \NP \setminus \set{\Sigma^*, \emptyset}
\subseteq \NPH\), where the last inclusion is by S4, so by
intersecting with \(\P\), we get that \(\emptyset \neq \P \setminus
\set{\Sigma^*, \emptyset} \subseteq \NPH \cap \P\) and hence \(\NPH
\cap \P \neq \emptyset\).)
So, to prove that each statement implies every other one, it
suffices to show that \(\text{S1} \implies \text{S3} \implies
\text{S4}\).
For \(\text{S1} \implies \text{S3}\), suppose that some \(\NP\)-hard
language \(L\) is in \(P\). By definition, \(A \leq_p L\) for all \(A \in
\NP\), so by Lemma 148, \(A \in \P\).
Therefore, \(\NP \subseteq \P\), and this is an equality because \(\P
\subseteq \NP\), as we have already seen.
For \(\text{S3} \implies \text{S4}\), suppose that \(\P = \NP\) and let
\(L \in \NP\) arbitrary but nontrivial; we show that \(L\) is \(\NP\)-hard.
By nontriviality, there exist some fixed, distinct strings \(y \in
L\) and \(z \notin L\). Let \(A \in \NP = \P\) be arbitrary, so there is
an efficient algorithm \(M_A\) that decides \(A\). We show that \(A
\leq_p L\) via the function \(f\) computed by the following
pseudocode:
It is apparent that \(F\) is efficient, because \(M_A\) is. And for
correctness, by the correctness of \(M_A\), the properties of \(y \neq
z\), and the code of \(F\),
\[\begin{split}x \in A
&\iff M_A(x) \accepts \\
&\iff f(x)=y \\
&\iff f(x) \in L \; \text.\end{split}\]
Since \(A \leq_p L\) for all \(A \in \NP\), we have shown that \(L\) is
\(\NP\)-hard, as needed.
Exercise 158
Show that polynomial-time mapping reductions are transitive. That
is, if \(A \leq_p B\) and \(B \leq_p C\), then \(A \leq_p C\).
Exercise 159
Show that no language Karp-reduces to \(\Sigma^*\) or to
\(\emptyset\). In particular, these languages are not\(\NP\)-hard,
regardless of whether \(\P = \NP\).
The Cook-Levin Theorem (Theorem 152,
showing that \(\SAT\) is \(\NP\)-hard), together with the concept of
polynomial-time mapping reductions and Lemma 154, are powerful tools for establishing the \(\NP\)-hardness
of a wide variety of other natural problems. We do so for several such
problems next.
As a second example of an \(\NP\)-complete problem, we consider
satisfiability of Boolean formulas with a specific structure. First,
we introduce a few more terms that apply to Boolean formulas:
A clause is a disjunction of literals, such as \((a \vee b \vee
\neg c \vee d)\).
A Boolean formula is in conjunctive normal form (CNF) if it consists
of a conjunction of clauses. An example of a CNF formula is:
\[\phi = (a \vee b \vee \neg c \vee d) \wedge (\neg a) \wedge
(a \vee \neg b) \wedge (c \vee d)\]
A 3CNF formula is a Boolean formula in conjunctive normal form
where each clause has exactly three literals, such as the
following:
\[\phi = (a \vee b \vee \neg c) \wedge (\neg a \vee \neg a \vee d)
\wedge (a \vee \neg b \vee d) \wedge (c \vee d \vee d)\]
We then define the language \(\TSAT\) as the set of satisfiable 3CNF
formulas:
\[\TSAT = \{\phi : \phi \text{ is a satisfiable 3CNF formula}\}\]
Observe that \(\TSAT\) contains strictly fewer formulas than \(\SAT\),
i.e. \(\TSAT \subset \SAT\). Despite this, we will demonstrate that
\(\TSAT\) is \(\NP\)-complete.
First, we must show that \(\TSAT \in \NP\). The verifier is the same as
the one for \(\SAT\), except that it is specialized to 3CNF formulas:
\begin{algorithmic}
\INPUT instance: a 3CNF formula; certificate: assignment to its
variables
\OUTPUT whether the assignment satisfies the formula
\FUNCTION{$V$}{$\phi, \alpha$}
\IF{$\phi(\alpha) = $ true} \ACCEPT \ENDIF
\STATE \REJECT
\ENDFUNCTION
\end{algorithmic}
The value of \(\phi(\alpha)\) can be computed in linear time, and we can
also check that \(\phi\) is in 3CNF and that \(\alpha\) is a valid
assignment in linear time, so \(V\) is efficient in \(\abs{\phi}\).
Furthermore, if \(\phi\) is in 3CNF and is satisfiable, there is some
certificate \(\alpha\) such that \(V(\phi, \alpha)\) accepts, but if
\(\phi\) is not in 3CNF or is unsatisfiable, \(V(\phi, \alpha)\) rejects
for all \(\alpha\). Thus, \(V\) efficiently verifies \(\TSAT\).
We now show that \(\TSAT\) is \(\NP\)-hard with the reduction \(\SAT \leq_p
\TSAT\). Given a Boolean formula \(\phi\), we need to efficiently compute
a new formula \(f(\phi)\) such that \(\phi \in \SAT \iff f(\phi) \in
\TSAT\). We can do so as follows:
First, convert \(\phi\) into a formula \(\phi'\) in conjunctive normal
form. We can do so efficiently, as described in detail below.
Second, convert the CNF formula \(\phi'\) into the 3CNF formula
\(f(\phi)\). We can do so as follows:
For each clause consisting of fewer than three literals, repeat
the first literal to produce a clause with three literals. For
example, \((a)\) becomes \((a \vee a \vee a)\), and \((\neg b \vee
c)\) becomes \((\neg b \vee \neg b \vee c)\).
For each clause with more than three literals, subdivide it into
two clauses by introducing a new dummy variable. In particular,
convert the clause
Observe that if all \(l_i\) are false, then the original clause is
not satisfied. The transformed subformula is not satisfied for
any value of \(z\), since it would require \(z \wedge \neg z\),
which is impossible. On the other hand, if at least one \(l_i\) is
true, then the original clause is satisfied. The transformed
subformula can be satisfied by choosing \(z\) to be true if \(i >
\floor{m/2}\) or \(z\) to be false if \(i \le \floor{m/2}\). Thus,
this transformation preserves the satisfiability of the original
clause.
We repeat the process on the clauses of the transformed
subformula, until all clauses have size exactly three. Each
transformation introduces \(\O(1)\) literals, giving us the
(approximate) recurrence
\[T(m) = 2T\parens{\frac{m}{2}} + \O(1)\]
By the Master Theorem, we conclude that \(T(m) = \O(m)\), so the
recursive transformation increases the size of the formula by
only a linear amount.
Applying both transformations gives us a new formula \(f(\phi)\) that is
polynomial in size with respect to \(\abs{\phi}\), and the
transformations can be done in polynomial time. The result \(f(\phi)\)
is in 3CNF, and it is satisfiable exactly when \(\phi\) is. Thus, we
have \(\phi \in \SAT \iff f(\phi) \in \TSAT\), and \(f\) is computable in
polynomial time, so we have \(\SAT \leq_p \TSAT\). Since \(\SAT\) is
\(\NP\)-hard, so is \(\TSAT\).
Having shown that \(\TSAT \in \NP\) and \(\TSAT\) is \(\NP\)-hard, we conclude
that \(\TSAT\) is \(\NP\)-complete.
\(\NP\)-completeness is not exclusive to Boolean satisfiability problems.
Rather, there is a wide range of problems across all fields that are
\(\NP\)-complete. As an example, we take a look at the clique problem.
Given an undirected graph \(G = (V, E)\), a clique is a subset of the
vertices \(C \subseteq V\) such that there is an edge in \(G\) between
every pair of vertices in the clique. Equivalently, the subgraph
induced by the clique \(C\), meaning the vertices in \(C\) and all edges
between them, is a complete graph.
The following is an example of a clique of size four, with the
highlighted vertices in the clique:
We are often interested in finding a clique of maximum size in a
graph, called a maximum clique for short. For instance, if a graph
represents a group of people and their individual friend
relationships, we might want to find the largest set of mutual friends
(hence the name “clique”). However, to obtain a decision problem, we
introduce a limited budget as we did for the traveling
salesperson problem:
\[\CLIQUE = \{(G, k) : G \text{ is an undirected graph that has a
clique of size $k$}\}\]
Here, we have defined \(\CLIQUE\) with an exact budget \(k\) rather than a
lower bound – if a graph \(G = (V, E)\) has a clique \(C\) of size \(m\),
then it has one for any size \(0 \le k \le m\) by removing arbitrary
vertices from \(C\). In other words, a subset of a clique is always
itself a clique.
To show that \(\CLIQUE\) is \(\NP\)-complete, we start by demonstrating that
it is in \(\NP\). A verifier for \(\CLIQUE\) takes a graph \(G\) and a
budget \(k\), and a set of vertices as the certificate, and check that
the vertices in the set indeed comprise a clique of size \(k\). We
define the verifier as follows. The input instance is a graph and a
non-negative integer \(k\). The certificate is a set of exactly \(k\)
vertices in the graph. The verifier simply checks that these vertices
form a clique, i.e., that there is an edge between each pair of
(distinct) vertices in the certificate.
\begin{algorithmic}
\INPUT instance: an undirected graph and non-negative integer $k$;
certificate: a set of $k$ vertices in the graph
\OUTPUT whether the vertices form a clique in the graph
\FUNCTION{VerifyCLIQUE}{$(G=(V,E),k), c = \set{v_1,\ldots,v_k}$}
\FORALL{distinct $i,j \in \set{1,\ldots, k}$}
\IF{$(v_i,v_j) \notin E$} \REJECT \ENDIF
\ENDFOR
\STATE \ACCEPT
\ENDFUNCTION
\end{algorithmic}
We first analyze the runtime of the verifier. The first step can be
done efficiently – checking whether \(c\) is a subset of \(V\) can be
done naïvely in quadratic time via nested loops. (Furthermore, by
checking that \(k \le \abs{V}\) and that \(\abs{c} = k\), we limit the
loop bounds to the size of the input graph.) The loop in the second
step does a quadratic number of iterations, each of which can be done
efficiently, so the total time is still polynomial.
We now analyze correctness. If \((G, k) \in \CLIQUE\), then there is
some clique \(C = \{v_1, \dots, v_k\} \subseteq V\) of size \(0 \le k \le
\abs{V}\). If we pass the set \(C\) as the certificate to the verifier,
we have \(\abs{C} = k\) and \(C \subseteq V\), so the certificate is not
rejected in the first step. Then since \(C\) is a clique, there is an
edge between every pair of vertices in \(C\), so the loop in the second
step does not reject. Thus, the verifier accepts \((G, k), C\).
On the other hand, if \((G = (V, E), k) \notin \CLIQUE\), then there is
no set of \(k\) vertices \(C\) such that \(C\) comprises a clique of \(G\). If
\(k\) is not in the range \([0, \abs{V}]\), the verifier rejects
immediately. In addition, for any certificate that is not a subset of
\(V\) whose size is \(k\), the verifier rejects in the first step. If the
verifier gets to the second step, then the certificate \(c\) must be a
set of vertices \(c = \{v_1, \dots, v_k\} \subseteq V\). By assumption,
\(G\) does not have a clique of size \(k\), so any set of \(k\) vertices
from \(G\) must have a missing edge between some pair of vertices within
that set. Since the verifier checks for an edge between all pairs of
vertices from the given set, it will find such a missing edge and
reject. Thus, when \((G, k) \notin \CLIQUE\), the verifier rejects all
certificates.
Since the verifier is correct and efficient, we have demonstrated that
\(\CLIQUE\) is efficiently verifiable and thus in \(\NP\).
We now show that \(\CLIQUE\) is \(\NP\)-hard with a polytime reduction from
the known \(\NP\)-hard \(\TSAT\), i.e. \(\TSAT \leq_p \CLIQUE\). To do so, we
must define a polynomial-time computable function \(f\) that converts a
3CNF formula \(\phi\) into an instance \((G, k)\) of the clique problem.
More specifically, we require:
Before considering how we can map formulas to graphs, let’s review the
definition of \(\TSAT\). It is the set of satisfiable 3CNF formulas,
where a formula has \(m\) clauses of three literals each:
Each literal is either a variable itself (e.g., \(v\)) or its negation
(e.g., \(\neg v\)).
A satisfying assignment simultaneously satisfies each clause, so that
each clause has at least one literal that is true. For a formula with
\(m\) clauses to be satisfiable, there must be some set of \(m\) literals,
one from each clause, that can simultaneously be true. If no such set
exists, the formula is unsatisfiable.
We can now relate 3CNF-satisfiability to the clique problem. Given a
formula \(\phi\) of \(m\) clauses, we can construct a graph \(G\) such that:
Each literal \(\ell_i \in phi\) has a corresponding vertex
\(v_i\) in \(G\).
A pair of vertices \(v_i\) and \(v_j\) have an edge in \(G\) if the
corresponding literals \(\ell_i\) and \(\ell_j\) are in
different clauses of \(\phi\), and the two literals can be
simultaneously true. This is exactly when the two literals are not
negations of each other (e.g., one is \(x\) and the other is \(\neg x\))
– either they are the same (e.g., they are both \(x\) or are both
\(\neg y\)), or they refer to different variables (e.g., one is \(x\)
and the other is \(\neg y\)).
The following illustrates the graph constructed for two different
formulas with three clauses each, with one formula satisfiable and the
other unsatisfiable:
Observe that the satisfiable formula on the left does have a clique of
size three (one such clique is highlighted), while the unsatisfiable
formula on the right does not have a clique of size three.
The formal conversion algorithm that computes the mapping reduction is
as follows:
\begin{algorithmic}
\INPUT a 3CNF formula $\phi$
\OUTPUT a graph $G$ and integer $m$, such that $\phi \in \TSAT{} \iff
(G,m) \in \CLIQUE{}$
\FUNCTION{3SATtoClique}{$\phi$}
\STATE $m = $ number of clauses in $\phi$
\STATE $G = $ an empty graph
\FORALL{literals $\ell_i$ in $\phi$}
\STATE add a vertex $v_i$ to $G$
\ENDFOR
\FORALL{pairs of literals $\ell_i, \ell_j$ in different clauses of
$\phi$}
\IF{$\ell_i \neq \neg \ell_j$}
\STATE add an edge $(v_i,v_j)$ to $G$
\ENDIF
\ENDFOR
\RETURN $(G,m)$
\ENDFUNCTION
\end{algorithmic}
We first analyze the efficiency of this algorithm. The first loop
takes linear time with respect to the size of the formula (i.e., the
number of literals in the formula), and the second loop takes
quadratic time. Thus, the entire conversion takes polynomial time with
respect to the input size.
We now analyze correctness. First, we show that \(\phi \in \TSAT
\implies f(\phi) \in \CLIQUE\). Since \(\phi \in \TSAT\), it is
satisfiable, which means there is some satisfying assignment \(\alpha\)
such that each clause in \(\phi\) has at least one literal that is true.
Let \(S = \{s_1, s_2, \dots, s_m\}\) refer to a set comprised of one
true literal from each clause. Then:
Each pair of literals \(s_i, s_j \in S\) can be true simultaneously,
since they are all true under the assignment \(\alpha\).
For each \(s_i, s_j \in S\), there is an edge in \(G\) between their
corresponding vertices, since the literals are from different
clauses and can be simultaneously true.
Since there is an edge in \(G\) between the vertices corresponding to
all pairs \(s_i, s_j \in S\), those vertices constitute a clique in
\(G\). Since \(\abs{S} = m\), \(G\) has a clique of size \(m\).
We now show the converse, that \(f(\phi) \in \CLIQUE \implies \phi \in
\TSAT\). It is important to note that we are not inverting the
function \(f\) here. Rather, this says that if the graph/budget pair
produced as the output of \(f\) are in \(\CLIQUE\), then the input
formula must be in \(\TSAT\). Since \(f(\phi) = (G, m) \in \CLIQUE\), the
output graph \(G\) has a clique of size \(m\). Denote the set of vertices
comprising the clique as \(C = \set{c_1, \dots, c_m}\). Then:
Since \(C\) is a clique, there is an edge between any pair of vertices
\(c_i, c_j\) in \(C\).
The algorithm \(f\) only places an edge between vertices if the
corresponding literals are from different clauses. Thus, the \(m\)
vertices in \(C\) must come from the \(m\) different clauses of the
input formula \(\phi\).
The algorithm \(f\) only places an edge between vertices if their
corresponding literals can be true simultaneously. Since there is an
edge between every pair of vertices in \(C\), all of their
corresponding literals can be true at the same time.
We can define an assignment \(\alpha\) of \(\phi\)’s variables by
setting them such that each literal corresponding to the vertices in
\(C\) becomes true. Since all the literals can be true simultaneously,
there is no conflict when defining this assignment. (Any variables
that remain unset after this process can be set arbitrarily.)
Then \(\alpha\) is a satisfying assignment – each literal
corresponding to the vertices in \(C\) is from a different clause, and
they are all true, so each of the \(m\) clauses has a true literal and
is therefore satisfied. Thus, \(\phi\) is satisfiable.
Since \(\phi \in \TSAT \iff f(\phi) \in \CLIQUE\) and \(f\) can be
computed efficiently, we have demonstrated that \(\TSAT \leq_p \CLIQUE\).
Since \(\TSAT\) is \(\NP\)-hard, we conclude that \(\CLIQUE\) is \(\NP\)-hard, and
since \(\CLIQUE \in \NP\), it is also \(\NP\)-complete.
Given an undirected graph \(G = (V, E)\), a vertex cover is a subset
of the vertices \(C \subseteq V\) such that every edge in \(E\) is covered
by a vertex in \(C\). An edge \(e = (u, v)\) is covered by \(C\) if at
least one of the endpoints \(u,v\) is in \(C\). In other words, \(C\) is a
vertex cover for \(G = (V, E)\) if \(C \subseteq V\) and
\[\forall e = (u, v) \in E.~ u \in C \vee v \in C\]
The following are two vertex covers for the same graph:
The cover on the left has a size of five vertices, while the cover on
the right has a size of four.
As a motivating example of the vertex-cover problem, consider a museum
structured as a set of interconnected hallways, with exhibits on the
walls. The museum needs to hire guards to ensure the security of the
exhibits, but guards are expensive, so it seeks to minimize the number
of guards hired while still ensuring that each hallway is covered by a
guard. The museum layout can be represented as a graph with the
hallways as edges and the vertices as hallway endpoints. Then the
museum just needs to figure out what the minimum vertex cover is to
determine how many guards to hire (or security cameras/motion
detectors to install) and where to place them.
\[\VC = \{(G, k) : G \text{ is an undirected graph that has a vertex
cover of size $k$}\}\]
Here, we have defined \(\VC\) with an exact budget \(k\) rather than an
upper bound – observe that if a graph \(G = (V, E)\) has a vertex cover
\(C\) of size \(m\), then it has one for any size \(m \le k \le \abs{V}\) by
adding arbitrary vertices from \(V \setminus C\) into \(C\). Put in museum
terms, if the museum’s budget allows, it can hire more guards than are
strictly necessary to cover all hallways.
We now demonstrate that \(\VC\) is \(\NP\)-hard, leaving the proof that \(\VC
\in \NP\) as an exercise. Combining the two, we will be able to
conclude that \(\VC\) is \(\NP\)-complete.
Exercise 161
Show that \(\VC \in \NP\).
We show that \(\VC\) is \(\NP\)-hard via the reduction \(\TSAT \leq_p \VC\). To
do so, we must define a polynomial-time computable function \(f\) that
converts a 3CNF formula \(\phi\) into an instance \((G, k)\) of the
vertex-cover problem. More specifically, we require:
\(\phi \in \TSAT \implies f(\phi) \in \VC\)
\(\phi \notin \TSAT \implies f(\phi) \notin \VC\)
Given a 3CNF formula \(\phi\) with \(n\) variables and \(m\) clauses, we
construct a graph \(G\) corresponding to that formula. The graph is
comprised of gadgets that are subgraphs, representing individual
variables and clauses. We will then connect the gadgets together such
that the resulting graph \(G\) has a vertex cover of size \(n + 2m\)
exactly when \(\phi\) is satisfiable.
The following are our gadgets for variables and clauses:
A gadget for variable \(x\) is a “barbell,” consisting of two vertices
labeled by \(x\) and \(\neg x\) connected by an edge. A gadget for a
clause \(x \vee y \vee z\) is a triangle, consisting of a vertex for
each literal, with the three vertices all connected together.
As a concrete example, consider the formula
\[\phi = (x \vee y \vee z) \wedge (\neg x \vee \neg y \vee \neg z)
\wedge (x \vee \neg y \vee z) \wedge (\neg x \vee y \vee \neg z)\]
We start building the corresponding graph \(G\) by incorporating gadgets
for each clause and variable:
Here, we have placed the variable gadgets at the top of the figure and
the clause gadgets at the bottom.
The next step in the construction is to connect each vertex in a
variable gadget to the vertices in the clause gadgets that have the
same literal as the label. For instance, we connect \(x\) in the
respective variable gadget to each vertex labeled by \(x\) in the clause
gadgets, \(\neg x\) in the variable gadget to each vertex labeled by
\(\neg x\) in the clause gadgets, and so on. The result is as follows:
This completes the construction of \(G\), and we have \(f(\phi) = (G,
n+2m)\). There are \(3m + 2n\) vertices in \(G\), \(n\) edges within the
variable gadgets, \(3m\) edges within the clause gadgets, and \(3m\) edges
that cross between the clause and variable gadgets, for a total of
\(\O(m + n)\) edges. Thus, \(G\) has size \(\O(n + m)\), and it can be
constructed efficiently.
We proceed to show that if \(\phi \in \TSAT\), then \(f(\phi) = (G, n+2m)
\in \VC\). Since \(\phi \in \TSAT\), there is some satisfying assignment
\(\alpha\) such that \(\phi(\alpha)\) is true. Then \(C\) is a vertex cover
for \(G\), where:
For the variable gadget corresponding to each variable \(x\), \(x\) is
in \(C\) if \(\alpha(x) = 1\), otherwise \(\neg x\) is in \(C\) (i.e. if
\(\alpha(x) = 0\)).
Observe that every edge internal to a variable gadget is covered.
Furthermore, since \(\alpha\) is a satisfying assignment, each clause
has at least one literal that is true. Then the edge connecting the
vertex in the corresponding clause gadget to the same literal in a
variable gadget is already covered – the vertex corresponding to
that literal in the variable gadget is in \(C\).
Thus, every clause gadget has at least one out of its three
“crossing” edges covered by a vertex in a variable gadget, and so
far, we have \(n\) vertices in \(C\).
For each uncovered crossing edge \((u, v)\), where \(u\) is in a
variable gadget and \(v\) is in a clause gadget, we have \(v \in C\).
There are at most two such crossing edges per clause. If a clause
has only one uncovered crossing edge, then pick another arbitrary
vertex in the clause to be in \(C\). Similarly, if a clause has no
uncovered crossing edges, then pick two of its vertices arbitrarily
to be in \(C\).
Now all crossing edges are covered by a vertex in a variable gadget,
or one in a clause gadget, or both. Furthermore, since we pick
exactly two vertices out of each clause gadget, the edges internal
to each clause gadget are also covered. Thus, all edges are covered,
and \(C\) is a vertex cover. This second step adds two vertices per
clause, or \(2m\) vertices. Combined with the previous vertices, \(C\)
has \(n + 2m\) vertices, and we have shown that \(G\) has a vertex cover
of size \(n + 2m\).
We now show that \(\phi \notin \TSAT \implies (G, n+2m) \notin \VC\).
More precisely, we demonstrate the contrapositive \((G, n+2m) \in \VC
\implies \phi \in \TSAT\). In other words, if \(G\) has a vertex cover of
size \(n + 2m\), then \(\phi\) has a satisfying assignment.
A vertex cover of \(G\) of size \(n + 2m\) must include the following:
exactly one vertex from each variable gadget to cover the edge
internal to the variable gadget
exactly two vertices from each clause gadget to cover the edges
internal to the clause gadget
This brings us to our total of \(n + 2m\) vertices in the cover. While
all internal edges are covered, we must also consider the edges that
cross between clause and variable gadgets. With the two vertices from
each clause gadget, only two of the three crossing edges are covered
by those vertices. The remaining crossing edge must be covered by a
vertex from a clause gadget. Thus, if \(G\) has a vertex cover \(C\) of
size \(n + 2m\), each clause \(k\) has a crossing edge \((u_l, v_k)\) where:
\(u_l\) is a variable vertex with label \(l\) for some literal \(l\)
\(v_k\) is a vertex in the clause gadget with the same label \(l\)
\(u_l \in C\), so that the crossing edge is covered
Then the assignment \(\alpha\) such that \(\alpha(l) = 1\) if \(u_l \in C\)
is a satisfying assignment – for each clause \(k\), we have the true
literal \(l\) corresponding to the vertex \(v_k\). Since all clauses are
satisfied, \(\phi\) as a whole is satisfied.
This completes our proof that \(\TSAT \leq_p \VC\). We conclude from the
facts that \(\TSAT\) is \(\NP\)-hard and \(\VC \in \NP\) that \(\VC\) is
\(\NP\)-complete.
Exercise 162
Define the language
\[\IS = \{(G, k) : G \text{ is an undirected graph with an
independent set of size } k\}\]
An independent set of a graph \(G = (V, E)\) is a subset of the
vertices \(I \subseteq V\) such that no two vertices in \(I\) share an
edge. Show that \(\VC \leq_p \IS\).
Hint: If \(G\) has a vertex cover of size \(m\), what can be
said about the vertices that are not in the cover?
Recall the language
\[\CLIQUE = \{(G, k) : G \text{ is an undirected graph that has
a clique of size $k$}\}\]
Consider another problem, that of hiring workers for a project. To
successfully complete the project, we need workers with some combined
set of skills \(S\). For instance, we might need an architect, a general
contractor, an engineer, an electrician, a plumber, and so on for a
construction project. We have a candidate pool of \(n\) workers, where
worker \(i\) has a set of skills \(S_i\). As an example, there may be a
single worker who is proficient in plumbing and electrical work. Our
goal is to hire a minimum-size team of workers to cover all the
required skills.
We formalize this problem as follows. We are given a set of elements
\(S\), representing the required skills. We are also given \(n\) subsets
\(S_i \subseteq S\), representing the set of skills that each worker \(i\)
has. We want to find the smallest collection of \(S_i\)’s that cover the
set \(S\). In other words, we wish to find the smallest set of indices
\(C\) such that
\[S = \bigcup_{i\in C} S_i\]
As a concrete example, let \(S\) and \(S_1, \dots, S_6\) be as follows:
We proceed to define a language corresponding to the set-cover
problem. As with vertex cover, we include a budget \(k\) for the size of
the cover \(C\).
\[\begin{split}\SC = \{(S, S_1, S_2, \dots, S_n, k) :~ &S_i \subseteq S \text{ for
each $i$, and there is a collection}\\
&\text{of $k$ sets $S_i$ that cover $S$}\}\end{split}\]
It is clear that \(\SC \in \NP\) – we can use as a certificate a set of
indices \(C\), and the verifier need only check that \(C\) is a size-\(k\)
subset of \([1, n]\), and that \(S = \cup_{i\in C} S_i\). This can be done
efficiently.
We now perform the reduction \(\VC \leq_p \SC\) to demonstrate that \(\SC\)
is \(\NP\)-hard. We need to convert an instance \((G = (V, E), k)\) to an
instance \((S, S_1, \dots, S_n, k')\) such that \(G\) has a vertex cover
of size \(k\) exactly when \(S\) has a set cover of size \(k'\). Observe
that an element of a vertex cover is a vertex \(v\), and it covers some
subset of the edges in \(E\). Similarly, a set cover consists of sets
\(S_i\), each of which cover some subset of the items in \(S\). Thus, an
edge corresponds to a skill, while a vertex corresponds to a worker.
Given a graph \(G = (V, E)\) and a budget \(k\), we define the function
\(f\) such it translates the set of edges \(E\) to the set of skills \(S\),
and each vertex \(v_i \in V\) to a set \(S_i\) consisting of the edges
incident to \(v_i\). Formally, we have \(f(G, k) = (S, S_1, \dots,
S_{\abs{V}}, k)\), where \(S = E\) and:
\[S_i = \{e \in E : e \text{ is incident to $v_i$, i.e. $e = (u,
v_i)$ for some $u \in V$}\}\]
This transformation can be done efficiently, as it is a direct
translation of the graph \(G\).
As an example, suppose that the graph \(G\) is as follows:
This graph has a cover of size four consisting of the vertices labeled
\(S_2, S_4, S_5, S_6\). The transformation \(f(G, k)\) produces the
following sets:
Then \(C = \{2, 4, 5, 6\}\) is a set cover of size four.
We now demonstrate that in the general case, \((G = (V, E), k) \in \VC\)
exactly when \(f(G, k) = (S, S_1, \dots, S_{\abs{V}}, k) \in \SC\).
If \((G, k) \in \VC\), then \(G\) has a vertex cover \(C \subseteq V\) of
size \(k\) that covers all the edges in \(E\). This means that
\[E = \bigcup_{v_i \in C} \{e \in E : e \text{ is incident to }
v_i\}\]
Let \(C' = \{i : v_i \in C\}\). Then we have
\[\begin{split}\bigcup_{i\in C'} S_i &= \bigcup_{i\in C'} \{e \in E : e \text{
is incident to } v_i\}\\
&= E = S\end{split}\]
Thus, \(C'\) covers \(S\), and \(S\) has a set cover of size \(k\).
If \((G, k) \notin \VC\), then \(G\) does not have a set cover of size
\(k\). Either \(k > \abs{V}\), or for any subset of vertices \(C
\subseteq V\), of size \(k\), we have
\[E \ne \bigcup_{v_i \in C} \{e \in E : e \text{ is incident to }
v_i\}\]
Then for any subset \(C' \subseteq \{S_1, \dots, S_{\abs{V}}\}\) of
size \(k\), we also have
\[\begin{split}\bigcup_{i\in C'} S_i &= \bigcup_{i\in C'} \{e \in E : e \text{
is incident to } v_i\}\\
\ne E = S\end{split}\]
Thus \(S\) does not have a set cover of size \(k\).
This completes our reduction \(\VC \leq_p \SC\).
We conclude our discussion of set cover by observing that a vertex
cover is actually a special case of a set cover. When we transform an
instance of the vertex-cover problem to one of set cover, each edge in
a graph is interpreted as a skill and each vertex as worker, which
means that each skill is shared by exactly two workers. This is more
restrictive then the general case for set cover, where any number of
workers (including none) may have a particular skill. The reduction
\(\VC \leq_p \SC\) just exemplifies that if we can solve the general
case of a problem efficiently, we can also solve special cases
efficiently. Conversely, if a special case is “hard,” then so is the
general case. [18]
Given a graph \(G\), a Hamiltonian path from \(s\) to \(t\) is a path that
starts at \(s\), ends at \(t\), and visits each vertex exactly once. A
Hamiltonian cycle is a path that starts and ends at the same vertex
and visits all other vertices exactly once. For example, \(G_1\) below
has a Hamiltonian cycle as well as a Hamiltonian path between \(s_1\)
and \(t\), but it does not have a Hamiltonian path between \(s_2\) and
\(t\). On the right, \(G_2\) has a Hamiltonian path between \(s\) and \(t\),
but it does not have a Hamiltonian cycle.
We define the languages corresponding to the existence of a
Hamiltonian path or cycle as:
\[\begin{split}&\HP = \{(G, s, t) : G \text{ is an undirected graph with a
Hamiltonian path between $s$ and $t$}\}\\
&\HC = \{G : G \text{ is an undirected graph with a Hamiltonian
cycle}\}\end{split}\]
Both languages are in \(\NP\) – we take the actual Hamiltonian path or
cycle as a certificate, and a verifier need only follow the path to
check that it is valid as in the verifier for TSP. We
claim without proof that \(\HP\) is \(\NP\)-hard – it is possible to reduce
\(\TSAT\) to \(\HP\), but the reduction is quite complicated, so we will
not consider it here. Instead, we perform the reduction \(\HP \leq_p \HC\)
to show that \(\HC\) is also \(\NP\)-hard. We need to demonstrate an
efficient function \(f\) such that \((G, s, t) \in \HP\) if and only if
\(f(G, s, t)\) has a Hamiltonian cycle.
Our first attempt is to produce a new graph \(G'\) that is the same as
\(G\), except that it has an additional edge between \(s\) and \(t\). Then
if \(G\) has a Hamiltonian path from \(s\) to \(t\), \(G'\) has a Hamiltonian
cycle that follows that same path from \(s\) to \(t\) and returns to \(s\)
through the additional edge. While this construction works in mapping
yes instances of \(\HP\) to yes instances of \(\HC\), it does not properly
map no instances of \(\HP\) to no instances of \(\HC\). The following
illustrates the construction on both a no and a yes instance of \(\HP\),
with the dotted line representing the added edge:
In the original graph on the left, there is no Hamiltonian path from
\(s_2\) to \(t\), yet the new graph does have a Hamiltonian cycle. Since
the function maps a no instance to a yes instance, it does not
demonstrate a valid mapping reduction.
The key problem with the transformation above is that it does not
force a Hamiltonian cycle to go through the additional edge. If the
cycle did go through the additional edge, we could remove that edge
from the cycle to obtain a Hamiltonian path in the original graph. To
force the cycle through the additions to the graph, we need to add a
new vertex, not just an edge. Thus, in place of the single edge, we
add two edges with a vertex between them, as in the following:
Now the generated graph on the left does not have a Hamiltonian cycle,
but the new graph on the right still does. Thus, this procedure looks
like it should work to map yes instances to yes instances and no
instances to no instances.
Formally, given \((G = (V, E), s, t)\), we have
\[f(G, s, t) = G' = (V \cup \{u\}, E \cup \{(s, u), (t, u)\})\]
The function is clearly efficient, as it just adds one edge and two
vertices to the input graph. We prove that it is also correct:
Suppose \(G\) has a Hamiltonian path from \(s\) to \(t\). This path visits
all the vertices in \(G\), and it has the structure
\((s,v_2,\dots,v_{n-1},t)\). The same path exists in \(G'\), and it
visits all vertices except the added vertex \(u\). The modified path
\((s,v_2,\dots,v_{n-1},t,u,s)\) with the additions of the edges
\((t,u)\) and \((u,s)\) visits all the vertices and returns to \(s\), so
it is a Hamiltonian cycle, and \(G' \in \HC\).
Suppose \(G'\) has a Hamiltonian cycle. Since it is a cycle, we can
start from any vertex in the cycle and return back to it. Consider
the specific path that starts at and returns to \(s\). It must also go
through \(u\) along the path, so it has the structure
\((s,\dots,u,\dots,s)\). Since the only edges incident to \(u\) are
\((s,u)\) and \((t,u)\) the path must either enter \(u\) from \(t\) and exit
to \(s\), or it must enter from \(s\) and exit to \(t\). We consider the
two cases separately.
Case 1:\(t\) precedes \(u\). Then the path must have the form
\((s,\dots,t,u,s)\). Since the subpath \((s,\dots,t)\) visits all
vertices except \(u\) and only includes edges from \(G\), it
constitutes a Hamiltonian path in \(G\) from \(s\) to \(t\).
Case 2:\(t\) follows \(u\). Then the path has the form
\((s,u,t,\dots,s)\). Since the subpath \((t,\dots,s)\) visits all
vertices except \(u\) and only includes edges from \(G\), it
constitutes a Hamiltonian path in \(G\) from \(t\) to \(s\). The graph
is undirected, so following the edges in reverse order gives a
Hamiltonian path from \(s\) to \(t\).
Thus, \(G\) has a Hamiltonian path from \(s\) to \(t\).
We have shown that \(G\) has a Hamiltonian path from \(s\) to \(t\) exactly
when \(G' = f(G, s, t)\) has a Hamiltonian cycle. This demonstrates that
\(f\) is a valid mapping from instances of \(\HP\) to instances of \(\HC\),
so \(\HP \leq_p \HC\). We conclude that \(\HC\) is \(\NP\)-hard.
Exercise 163
Define the language
\[\cproblem{LONG-PATH} = \{(G, k) : G \text{ is an undirected
graph with a simple path of length } k\}\]
A simple path is a path without any cycles. Show that
\(\cproblem{LONG-PATH}\) is \(\NP\)-complete.
Exercise 164
Recall the \(\TSP\) language:
\[\begin{split}\TSP = \left\{\begin{aligned}
(G, k) : ~&G \text{ is a weighted, complete graph that has
a tour that visits}\\
&\text{all vertices in $G$ and has a total weight at most $k$}
\end{aligned}\right\}\end{split}\]
Show that \(\TSP\) is \(\NP\)-hard with the reduction \(\HC \leq_p \TSP\).
The subset-sum problem is the task of finding a subset of set of
integers \(S\) that add up to a target value \(k\). The decision version
of this problem is as follows:
\[\SSUM = \left\{(S, k) : S \text{ is a set of integers that has a
subset } S^* \subseteq S \text{ such that } \sum_{s \in S^*}s =
k\right\}\]
This language is in \(\NP\) – we can take the actual subset \(S^*
\subseteq S\) as a certificate, and a verifier can efficiently check
that it is a valid subset of \(S\) and that its elements sum to the
value \(k\). We show \(\TSAT \leq_p \SSUM\) to demonstrate that \(\SSUM\) is
also \(\NP\)-hard: we define a polytime computable function \(f\) that takes
a 3CNF formula \(\phi\) with \(n\) variables and \(m\) clauses and produces
a set of integers \(S\) and target value \(k\) such that
A satisfying assignment \(\alpha\) of \(\phi\) has two key properties:
for each variable \(x_i \in \phi\), exactly one of \(\alpha(x_i)\) or
\(\alpha(\neg x_i)\) is 1
for each clause \(C_j = (l_{j,1} \vee l_{j,2} \vee l_{j,3}) \in
\phi\), at least one of \(\alpha(l_{j,1}), \alpha(l_{j,2}),
\alpha(l_{j,3})\) is 1
The set of integers \(S\) and value \(k\) produced by \(f(\phi)\) are
carefully designed to correspond to these properties. Specifically,
the set \(S\) consists of decimal (base-10) integers that are \(n+m\)
digits long – the first \(n\) digits correspond to the \(n\) variables,
while the last \(m\) digits correspond to the clauses. For each variable
\(x_i\), \(S\) contains two numbers \(v_i\) and \(w_i\):
\(v_i\) is 1 in digit \(i\) and in digit \(n+j\) for each clause \(C_j\) that
contains the literal \(x_i\), 0 in all other digits
\(w_i\) is 1 in digit \(i\) and in digit \(n+j\) for each clause \(C_j\) that
contains the literal \(\neg x_i\), 0 in all other digits
We also set the \(i\)th digit of the target value \(k\) to be 1 for \(1 \le
i \le n\). For a subset \(S^*\) to add up to \(k\), it must contain exactly
one of \(v_i\) or \(w_i\) for each \(1 \le i \le n\), since only those two
numbers contain a 1 in the \(i\)th digit. (Since we are using decimal
numbers, there aren’t enough numbers in \(S\) to get a 1 in the \(i\)th
digit through a carry from less-significant digits.) Thus, a subset
\(S^*\) that adds up to \(k\) corresponds to an actual assignment of
values to the variables, enforcing the first property above.
For a clause digit \(n+j, 1 \le j \le m\), the subset \(S^*\) can produce
one of the following values corresponding to the clause \(C_j =
(l_{j,1} \vee l_{j,2} \vee l_{j,3})\):
0 if \(S^*\) contains none of the numbers corresponding to \(l_{j,1},
l_{j,2}, l_{j,3}\) being true (e.g. if \(l_{j,1} = x_i\), then \(S^*\)
does not contain \(v_i\), and if \(l_{j,1} = \neg x_i\), then \(S^*\) does
not contain \(w_i\))
1 if \(S^*\) contains exactly one number corresponding to \(l_{j,1},
l_{j,2}, l_{j,3}\) being true
2 if \(S^*\) contains exactly two numbers corresponding to \(l_{j,1},
l_{j,2}, l_{j,3}\) being true
3 if \(S^*\) contains all three numbers corresponding to \(l_{j,1},
l_{j,2}, l_{j,3}\) being true
For the clause \(C_j\) to be satisfied by the assignment corresponding
to \(S^*\), all but the first case work. We need to add more numbers to
\(S\) and set digit \(n+j\) of \(k\) such that the three latter cases are
all viable. To do so, for each clause \(C_j\):
set the value for digit \(n+j\) to be 3 in the target number \(k\)
add two numbers \(y_j\) and \(z_j\) to \(S\) that have a 1 in digit \(n+j\)
and 0 in all other digits
This completes our construction of \(S\) and \(k\). This can be done
efficiently – we generate \(2n+2m+1\) numbers, each of which is \(n+m\)
digits long, and we can do so in quadratic time with respect to
\(\abs{\phi} = \O(n+m)\). For a subset \(S^* \subseteq S\) to add up to
\(k\), it must contain:
exactly one of \(v_i\) or \(w_i\) for each variable \(x_i, 1 \le i \le n\),
corresponding to an assignment to the variables
none, one, or both of \(y_j\) and \(z_j\) for each clause \(C_j, 1 \le j
\le m\), depending on whether the clause \(C_j\) has three, two, or one
literal(s) satisfied by the corresponding assignment
We generate the following numbers, with columns corresponding to
digits and unspecified digits implicitly 0:
\(~\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(C_1\)
\(C_2\)
\(C_3\)
\(v_1\)
1
1
\(v_2\)
1
1
1
\(v_3\)
1
1
\(v_4\)
1
1
\(w_1\)
1
1
\(w_2\)
1
1
\(w_3\)
1
1
\(w_4\)
1
1
\(y_1\)
1
\(y_2\)
1
\(y_3\)
1
\(z_1\)
1
\(z_2\)
1
\(z_3\)
1
\(k\)
1
1
1
1
3
3
3
Consider the satisfying assignment \(\alpha(x_1, x_2, x_3, x_4) = (0,
1, 0, 1)\). The corresponding subset \(S^*\) contains \(w_1, v_2, w_3,
v_4\), which add up to \(1111113\) – clauses \(C_1\) and \(C_2\) are
satisfied by one literal each (\(x_2\) and \(\neg x_3\), respectively),
while clause \(C_3\) is satisfied by all three literals. To add up to \(k
= 1111333\), \(S^*\) also contains \(y_1, z_1, y_2, z_2\).
Now consider the assignment \(\alpha'(x_1, x_2, x_3, x_4) = (0, 0, 0,
1)\) that does not satisfy the formula. The corresponding subset \(S^*\)
contains \(w_1, w_2, w_3, v_4\), which add up to \(1111022\). The highest
value we can get in digit 5 (corresponding to clause \(C_1\)) is 2, by
adding both \(y_1\) and \(z_1\) to the subset. This does not get us to the
target value of 3.
Formally, for an arbitrary formula \(\phi\) with \(n\) variables and \(m\)
clauses and resulting \(f(\phi) = (S, k)\), we have:
If \(\phi \in \TSAT\), then it has an assignment \(\alpha\) that
satisfies all clauses, meaning that at least one literal in each
clause is true. The subset
sums to 1 in each of the first \(n\) digits, and at least 1 in each of
the last \(m\) digits – each clause \(C_j\) is satisfied by at least
one literal, and the digit \(n+j\) is set to 1 in the number
corresponding to that literal in the subset \(S_a^*\). Thus, the full
subset
\[\begin{split}S^* =~ &S_a^* ~\cup\\
&\{y_j : \phi(\alpha) \text{ sets $\le 2$ literals of $C_j$ to
$1$}\} ~\cup\\
&\{z_j : \phi(\alpha) \text{ sets $1$ literal of $C_j$ to $1$}\}\end{split}\]
adds up to the target value \(k\), so \(f(\phi) = (S, k) \in \SSUM\).
If \(f(\phi) = (S, k) \in \SSUM\), then there is some subset \(S^*\) of
the numbers \(S = \{v_i\} \cup \{w_i\} \cup \{y_j\} \cup \{z_j\}\)
that adds up to \(k\). This subset \(S^*\) must contain exactly one of
\(v_i\) or \(w_i\) for each \(1 \le i \le n\) so that the \(i\)th digit in
the sum is the target 1. This corresponds to the assignment
\[\begin{split}\alpha(x_i) = \begin{cases}
1 &\mbox{ if $v_i \in S^*$}\\
0 &\mbox{ if $w_i \in S^*$}
\end{cases}\end{split}\]
For each digit \(n+j, 1 \le j \le m\), there must be some \(v_i\) or
\(w_i\) in \(S^*\) that has that digit set to 1 – there are only two
numbers among the \(y_j\) and \(z_j\) that have this digit set to 1, so
they cannot add up to the target value of 3 on their own. If there
is a \(v_i \in S^*\) that has the \((n+j)\)th digit set to 1, then by
construction, the clause \(C_j\) contains the literal \(x_i\), so it is
satisfied by the assignment \(\alpha\) since \(\alpha(x_i) = 1\). On the
other hand, if there is a \(w_{i'} \in S^*\) that has the \((n+j)\)th
digit set to 1, then the clause \(C_j\) contains the literal \(\neg
x_{i'}\), and it is satisfied by the assignment \(\alpha\) since
\(\alpha(x_{i'}) = 0\). In either case, the clause \(C_j\) is satisfied.
Since this reasoning applies to all clauses, the assignment \(\alpha\)
satisfies the formula as a whole, and \(\phi \in \TSAT\).
We have shown that \(f\) is efficiently computable and that \(\phi \in
\TSAT \iff f(\phi) \in \SSUM\), so we have \(\TSAT \leq_p \SSUM\). We can
conclude that \(\SSUM\) is \(\NP\)-hard, and since it is also in \(\NP\), that
it is \(\NP\)-complete.
Exercise 165
Define the language \(\KNAPSACK\) as follows:
\[\begin{split}\KNAPSACK = \left\{\begin{aligned}
(V,W,n,v,C) : ~&\abs{V} = \abs{W} = n
\text{ and there exist indices }\\
&i \in 1, \dots, n \text{ such that } \sum_i V[i] \ge v \andt\\
&\sum_i W[i] \le C
\end{aligned}\right\}\end{split}\]
This corresponds to the following problem:
There are \(n\) items.
Each item \(i\) has a value \(V(i)\) and a weight \(W(i)\), where
\(V(i)\) and \(W(i)\) are integers.
We have a target value of at least \(v\), where \(v\) is an integer.
We have a weight capacity limit of at most \(C\), where \(C\) is an
integer.
Is it possible to select a subset of the items such that the
total value of the subset is at least \(v\) but the total weight
of the subset is at most \(C\)?
We have explored only the tip of the iceberg when it comes to
\(\NP\)-complete problems. Such problems are everywhere, including
constraint satisfaction (\(\SAT\), \(\TSAT\)), covering problems (vertex
cover, set cover), resource allocation (knapsack, subset sum),
scheduling, graph colorability, model-checking, social networks
(clique, maximum cut), routing (\(\HP\), \(\TSP\)), games (Sudoku,
Battleship, Super Mario Brothers, Pokémon), and so on. An efficient
algorithm for any of these problems admits an efficient solution for
all of them.
The skills to reason about a problem and determine whether it is
\(\NP\)-hard are critical. Researchers have been working for decades to
find an efficient solution for \(\NP\)-complete problems to no avail. This
makes it exceedingly unlikely that we will be able to do so. Instead,
when we encounter such a problem, a better path is to focus on
approximation algorithms, which we turn to next.
Thus far, we have focused our attention on decision problems,
formalized as languages. We now turn to functional or search
problems, those that do not have a yes/no answer. Restricting
ourselves to decision problems does not leave us any room to find an
answer that is “close” to correct, but a larger codomain will enable
us to consider approximation algorithms for \(\NP\)-hard problems. We
previously encountered Kolmogorov complexity as an
example of an uncomputable functional problem. Here, we consider
problems that are computable, with the intent of finding efficient
algorithms to solve such problems. Some examples of search problems
are:
Given an array of integers, produce a sorted version of that array.
Given a Boolean formula, find a satisfiable assignment.
Given a graph, find a minimum vertex cover.
In general, a search problem may be:
a maximization problem, such as finding a maximum clique in a
graph
a minimization problem, such as finding a minimum vertex cover
an exact problem, such as finding a satisfying assignment or a
Hamiltonian path
Maximization and minimization problems are also called optimization
problems.
For each kind of search problem, we can define a corresponding
decision problem. [19] For maximization and minimization, we introduce
a limited budget. Specifically, does a solution exist that meets the
budget? The following are examples:
\[\begin{split}&\CLIQUE = \{(G, k) : G \text{ is an undirected graph that has a
clique of size $k$}\}\\
&\VC = \{(G, k) : G \text{ is an undirected graph that has a vertex
cover of size $k$}\}\end{split}\]
For an exact problem, the corresponding decision problem is whether a
solution exists that meets the required criteria. The following is an
example:
\[\SAT = \{\phi : \phi \text{ is a satisfiable Boolean formula}\}\]
We have seen that the languages \(\CLIQUE\), \(\VC\), and \(\SAT\) are
\(\NP\)-complete. How do the search problems compare in difficulty to the
decision problems?
It is clear that given an efficient algorithm to compute the answer to
a search problem, we can efficiently decide the corresponding
language. For instance, a decider for \(\CLIQUE\) can invoke the
algorithm to find a maximum clique and then check whether the result
has size at least \(k\). Thus, the decision problem is no harder than
the search problem.
What if we have an efficient decider \(D\) for \(\CLIQUE\)? Can we then
efficiently find a clique of maximum size? It turns out that we can.
Given a graph \(G = (V, E)\), we first find the size of a largest clique
by running \(D\) on \((G, k)\) for each \(k\) from \(\abs{V}\) down to \(0\),
stopping at the first \(k\) for which \(D\) accepts:
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT the size of (number of vertices in) its largest clique
\FUNCTION{$S_1$}{$G=(V,E)$}
\FOR{$k=\abs{V}$ down to $0$}
\IF{$D(G,k)$ accepts} \RETURN $k$ \ENDIF
\ENDFOR
\ENDFUNCTION
\end{algorithmic}
Since a clique is a subset of the vertices, its size cannot exceed
that of the vertex set, so we query the decider to determine whether
there is a clique of this size. If so, we are done; if not, we
repeatedly query again with the next-smaller possible size. We stop as
soon as we get an affirmative answer. Thus, the algorithm produces the
size of a largest clique. The algorithm is also efficient, since it
makes as most \(\abs{V}\) calls to \(D\), and \(D\) is efficient by
hypothesis.
Note that in general for other optimization problems, a linear search
like this to find the optimal size/weight/etc. may not be efficient;
it depends on how the number of possible answers relates to the input
size. For the max-clique problem, the size of the search space is
linear in the input size, so a linear search suffices. For other
problems, the size of the search space might be exponential in the
input size, in which case a binary search may be needed.
Once we know the size of a largest clique, we can actually search for
a clique of that size. This can be done using either of two common
strategies.
The first strategy is a “destructive” one, where we remove pieces of
the input until all we have left is an object we’re looking for. In
the case of the clique problem, we temporarily discard a vertex and
all its incident edges, then query the decider to see if the graph
still has a clique of the required size. If so, the vertex is
unnecessary—there is some largest clique that does not use the
vertex—so we remove it permanently. Otherwise, the discarded vertex
is part of every largest clique, so we restore it. We repeat this
process for all vertices, and at the end, we will be left with just
those that make up a largest clique. The algorithm is as follows:
\begin{algorithmic}
\INPUT an undirected graph $G$ whose largest clique has size $k$
\OUTPUT a largest clique in $G$
\FUNCTION{FindMaxClique}{$G=(V,E), k$}
\FORALL{$v \in V$}
\STATE let $G'$ be $G$ with $v$ and all its incident edges
removed
\IF{$D(G',k)$ accepts}
\STATE $G=G'$
\ENDIF
\ENDFOR
\RETURN $V(G)$, the set of (remaining) vertices in $G$
\ENDFUNCTION
\end{algorithmic}
This algorithm does a single pass over the vertices. Each iteration
removes a vertex and its incident edges, which can be done
efficiently, and invokes the decider \(D\), which is efficient by
hypothesis. Thus, the entire algorithm is efficient.
The second common strategy is a “constructive” one: we build up a
solution piece by piece, guessing a piece of the solution and checking
the guess by querying the decider. For the clique problem, we can
guess that a vertex \(v\) is in some clique of size \(k\). If it is, then
there must be a clique of size \(k-1\) among just the vertices that are
adjacent to \(v\), i.e., its neighborhood. This is because a clique of
size \(k\) is a complete subgraph of size \(k\), and if we ignore one
vertex in this subgraph, the rest is a complete subgraph of size
\(k-1\). Conversely, if the neighborhood of \(v\) has a clique of size
\(k-1\), then adding \(v\) to it produces a clique of size \(k\), since
there is an edge between \(v\) and each vertex in its neighborhood. So,
our algorithm iterates over each vertex in the graph and checks if its
neighborhood has a clique of size one smaller than currently desired.
If so, we include \(v\) in our final output and continue to search just
within the neighborhood for a clique of the smaller size. If not, the
vertex is not part of any clique of the desired size, and we just move
on to the next candidate vertex. The formal algorithm is as follows:
\begin{algorithmic}
\INPUT an undirected graph $G$ whose largest clique has size $k$
\OUTPUT a largest clique in $G$
\FUNCTION{FindMaxClique}{$G=(V,E), k$}
\STATE $C = \emptyset$
\FORALL{$v \in V$}
\STATE $G' = $ \CALL{Neighborhood}{$G,v$}
\IF{$D(G',k-1)$ accepts}
\STATE $C = C \cup \set{v}$
\STATE $G = G'$
\STATE $V = V(G')$
\STATE $k = k-1$
\ENDIF
\ENDFOR
\STATE \RETURN $C$
\ENDFUNCTION
\FUNCTION{Neighborhood}{$G=(V,E), v$}
\STATE $V' = \set{u \in V : u \neq v, (u,v) \in E}$
\COMMENT{all neighbors of $v$ (other than $v$ itself)}
\STATE $E' = \set{(u,w) \in E : u,w \in V'}$
\COMMENT{all edges between $v$'s neighbors}
\STATE \RETURN $(V',E')$
\ENDFUNCTION
\end{algorithmic}
This algorithm also does a single pass over the vertices in \(G\). Each
iteration computes the neighborhood of a vertex, which can be done
efficiently by in turn iterating over all the vertices and edges in
the graph. Each iteration of the algorithm also invokes \(D\), which is
efficient by hypothesis. Since the algorithm does a linear number of
iterations and each is efficient, the algorithm as a whole is also
efficient.
Example 166
Suppose we have an efficient decider \(D\) for \(\VC\). We can
efficiently find a minimum vertex cover as follows. Given a graph
\(G = (V, E)\), we do the following:
Find the size of a minimum vertex cover by running \(D\) on \((G, k)\)
for each \(k\) from \(1\) to \(\abs{V}\), stopping on the first \(k\) for
which \(D\) accepts.
We use a “constructive” strategy to deduce which vertices are
part of the cover. Pick a vertex \(v\). If it is in a minimum
vertex cover of size \(k\), we can remove it and all adjacent
edges, and the resulting graph has a vertex cover of size \(k-1\).
If \(v\) is not in a minimum vertex cover, the resulting graph will
not have a vertex cover of size \(k-1\). Thus, we can use a call to
\(D\) on the new graph to determine whether or not \(v\) is in a
minimum cover. If it is, we continue our search on the smaller
graph, otherwise we restore the previous graph. We continue the
search until we’ve checked each vertex for whether it is in the
minimum cover.
The full search algorithm is as follows:
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT a minimum vertex cover of the graph
\FUNCTION{MinVertexCover}{$G=(V,E)$}
\STATE $k=0$
\WHILE{$D(G,k)$ rejects}
\STATE $k=k+1$
\ENDWHILE
\STATE $C = \emptyset$
\FORALL{$v \in V$}
\STATE let $G'$ be $G$ with $v$ and all its incident edges
removed
\IF{$D(G',k-1)$ accepts}
\STATE $C = C \cup \set{v}$
\STATE $G = G'$
\ENDIF
\STATE $k=k-1$
\ENDFOR
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
The first loop invokes \(D\) at most \(\abs{V}\) times, since a cover
is a subset of all the vertices. Since \(D\) is efficient, the loop
completes in polynomial time. The second loop invokes \(D\) at most
\(\abs{V}\) times, and \(G'\) can be constructed efficiently, so the
loop as a whole is efficient. Thus, the algorithm runs in
polynomial time with respect to \(\abs{G}\). We conclude that if we
can decide \(\VC\) efficiently, we can find an actual minimum vertex
cover efficiently.
Exercise 167
Given an efficient decider \(D\) for \(\SAT\), demonstrate an efficient
algorithm that finds a satisfying assignment for a Boolean formula
\(\phi\), if such an assignment exists.
We have shown that the search problems of finding a maximum clique or
minimum vertex cover has the same difficulty as the decision problems
\(\CLIQUE\) or \(\VC\), respectively. Since \(\CLIQUE\) and \(\VC\) are
\(\NP\)-complete, this means that an efficient solution to either search
problem leads to \(\P = \NP\). Informally, we say that the search
problems are NP-Hard, much as we did for a decision problem for
which an efficient solution implies \(\P = \NP\). (Note, however, that
the search problems are not\(\NP\)-complete, as the class \(\NP\)
consists only of decision problems.)
We claim without proof that an \(\NP\)-hard search problem has an
efficient algorithm if and only if its corresponding decision problem
is efficiently solvable. Thus, we are unlikely to construct efficient
algorithms to find exact solutions to \(\NP\)-hard search problems.
Instead, we focus our attention on designing efficient algorithms to
approximate\(\NP\)-hard optimization problems.
Given an input \(x\), an optimization problem seeks to find an output
\(y\) such that \(\val(x, y)\) is optimized for some metric \(value\). For
a maximization problem, the goal is to find an optimal value \(y^*\)
such that:
\[\forall y.~ \val(x, y^*) \ge \val(x, y)\]
That is, \(\val(x, y^*)\) is at least as large as \(\val(x, y)\) for all
other outputs \(y\), so that \(y^*\) maximizes the metric for a given
input \(x\). Similarly, in a minimization problem, the goal is to find
\(y^*\) such that:
\[\forall y.~ \val(x, y^*) \le \val(x, y)\]
Here, \(y^*\) produces the minimum value for the metric \(\val(x, y)\)
over all \(y\). In both cases, we define \(OPT(x) = \val(x, y^*)\) as the
optimal value of the metric for an input \(x\). We then define an
approximation as follows:
Definition 168(α-approximation)
Given an input \(x\), a solution \(y\) is an α-approximation if:
For a maximization problem,
\[\alpha\cdot OPT(x) \le \val(x, y) \le OPT(x)\]
where \(\alpha < 1\).
For a minimization problem,
\[OPT(x) \le \val(x, y) \le \alpha\cdot OPT(x)\]
where \(\alpha > 1\).
In both cases, \(\alpha\) is the approximation ratio.
The closer \(\alpha\) is to one, the better the approximation.
Definition 169(α-approximation Algorithm)
An α-approximation algorithm is an algorithm \(A\) such that for
all inputs \(x\), the output \(A(x)\) is an α-approximation.
Let us return to the problem of finding a minimum vertex cover. Can we
design an efficient algorithm to approximate a minimum cover with a
good approximation ratio \(\alpha\)? Our first attempt is the
cover-and-remove algorithm, which repeatedly finds an edge, removes
one of its endpoints, and removes all edges incident to that endpoint:
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT a vertex cover of the graph
\FUNCTION{CoverAndRemove}{$G$}
\STATE $C = \emptyset$
\WHILE{$G$ has at least one edge}
\STATE choose an arbitrary edge $e = (u,v) \in E$
\STATE remove $v$ and all its incident edges from $G$
\STATE $C = C \cup \set{v}$
\ENDWHILE
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
How well does this algorithm perform? Consider the following
star-shaped graph:
On this graph, it is possible for the algorithm to pick the vertices
on the outside of the star to remove rather than the one in the
middle. The result would be a cover of size four, while the optimal
cover has size one (just the vertex in the middle), giving an
approximation ratio of 4 for this graph. However, larger star-shaped
graphs can give rise to arbitrarily larger approximation ratios –
there is no constant \(\alpha\) such that the cover-and-remove algorithm
achieves a ratio of \(\alpha\) on all possible input graphs.
Observe that in the star-shaped graph, the vertex in the center has
the largest degree, the number of edges that are incident to it. If
we pick that vertex first, we cover all edges, achieving the optimal
cover of just a single vertex for the star-shaped graph. This
motivates a greedy strategy that repeatedly adds and removes a
vertex with the largest remaining degree to the cover:
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT a vertex cover of the graph
\FUNCTION{GreedyCover}{$G$}
\STATE $C = \emptyset$
\WHILE{$G$ has at least one edge}
\STATE choose a vertex $v \in V$ having the most remaining incident edges
\STATE remove $v$ and all its incident edges from $G$
\STATE $C = C \cup \set{v}$
\ENDWHILE
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
While this algorithm works well for a star-shaped graph, there are
other graphs for which it does not achieve an approximation within a
constant \(\alpha\). Consider the following bipartite graph:
The optimal cover consists of the six vertices at the top, while the
algorithm instead chooses the eight vertices at the bottom. The vertex
at the bottom left has a degree of six, compared to the maximum degree
of five among the top vertices, so the bottom-left vertex is picked
first. Then the second vertex on the bottom has degree five, while the
top vertices now have degree no more than four. Once the second vertex
is removed, the vertex with highest degree is the third on the bottom,
and so on until all the bottom vertices have been chosen for the
cover. While the algorithm achieves a ratio of \(4/3\) on this graph,
larger graphs can be constructed with a similar structure, with \(k\)
vertices at the top and ~\(k \log k\) at the bottom. The algorithm
produces a cover with ratio \(\log k\) compared to the optimal, and that
ratio can be made arbitrarily large by increasing \(k\).
The problem with both the cover-and-remove and greedy algorithms is
the absence of a benchmark, a way of comparing the algorithm against
the optimal result. Before we proceed to design another algorithm, we
establish a measure for a minimum vertex cover that we can benchmark
against, using a set of pair-wise disjoint edges from the input graph.
Given a graph \(G = (V, E)\), a subset of edges \(S \subseteq E\) is
pair-wise disjoint if no two edges in \(S\) share a common vertex.
Given any pair-wise disjoint set of edges \(S\) from \(G\), any vertex
cover \(C\) must have at least \(\abs{S}\) vertices – since no two edges
in \(S\) share a common vertex, we must choose a distinct vertex to
cover each edge in \(S\). This gives us a way of benchmarking an
approximation algorithm – ensure that the algorithm picks no more
than a constant number of vertices for each edge in some pair-wise
disjoint set of edges \(S\).
In fact, only a small modification to the cover-and-remove algorithm
is necessary to meet this benchmark. Rather than selecting just one of
the two endpoints of each chosen edge, we include both of them in
the cover, and remove them and all their incident edges. The remaining
edges do not share a vertex with the edge we chose, so they are each
disjoint with respect to that edge. By induction, the full set of
edges chosen by this double-cover algorithm is pairwise disjoint.
The algorithm is as follows:
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT a 2-approximate minimum vertex cover of the graph
\FUNCTION{DoubleCover}{$G$}
\STATE $C = \emptyset$
\WHILE{$G$ has at least one edge}
\STATE choose an arbitrary edge $e = (u,v) \in E$
\STATE remove $u,v$ and all their incident edges from $G$
\STATE $C = C \cup \set{u,v}$
\ENDWHILE
\RETURN $C$
\ENDFUNCTION
\end{algorithmic}
Claim 170
The edges chosen by the double-cover algorithm are pairwise
disjoint for any input graph.
Proof 171
Let \(S_i\) be the set of edges chosen by the double-cover algorithm
in the first \(i\) iterations on an arbitrary graph \(G\). We show by
induction that \(S_i\) is pair-wise disjoint.
Base case:\(S_1\) contains only a single edge, so it is
trivially pair-wise disjoint.
Inductive step: Assume \(S_i\) is pair-wise disjoint. When the
algorithm chooses an edge \((u,v)\), it removes both \(u\) and \(v\) and
all edges incident to either \(u\) or \(v\). Thus, the graph that
remains after iteration \(i\) does not contain any of the vertices
\(u,v\) for each edge \((u,v) \in S_i\). The edges that remain in the
graph all have endpoints that are distinct from those that appear
in \(S_i\), so all remaining edges are disjoint from any edge in
\(S_i\). Then no matter what edge \(e\) the algorithm chooses next, it
will be disjoint from any edge in \(S_i\). Since \(S_i\) itself is
pair-wise disjoint, \(S_{i+1} = S_i \cup \{e\}\) is also pair-wise
disjoint.
How good is this algorithm? Let \(S\) be the full set of edges chosen by
the algorithm on an input graph \(G\). Since \(S\) is pair-wise disjoint,
the minimum vertex cover \(C^*\) of \(G\) must have at least \(\abs{S}\)
vertices. The cover \(C\) produced by the double-cover algorithm has
\(2\abs{S}\) vertices. We have:
\[\val(G, C) = 2\abs{S} \le 2\cdot \val(G, C^*)\]
Thus, the resulting cover is a 2-approximation, and double-cover is a
2-approximation algorithm.
As another example, we design an approximation algorithm for the
\(\NP\)-hard problem of finding a maximum cut in an undirected graph. A
cut in a graph \(G = (V, E)\) is just a subset \(S \subseteq V\) of the
vertices; this partitions \(V\) into two disjoint sets, \(S\) and its
complement \(\overline{S} = V \setminus S\). The size of a cut \(S\) is
the number of edges that cross between \(S\) and \(\overline{S}\). More
formally, we define the set of crossing edges of a cut \(S\):
\[C(S) = \set{(u, v) \in E : u \in S \andt v \in T} \; \text.\]
Then the size of the cut \(S\) is \(\abs{C(S)}\). Observe that for any cut
\(S\), its complement cut \(\overline{S}\) has the same crossing edges and
hence the same size, because the edges \((u,v)\) and \((v,u)\) are the
same.
The following is an example of a cut \(S\), with the crossing edges
\(C(S)\) represented as dashed lines:
A maximum cut of a graph (or max-cut, for short) is a cut that has
maximum size among all cuts in the graph. Maximum cuts have important
applications to circuit design, combinatorial optimization, and
statistical physics.
The decision version of the maximum-cut problem is defined as:
\[\MAXCUT = \set{(G, k) : G \text{ has a cut of size } k} \; \text.\]
This is an \(\NP\)-complete problem, though we do not prove it here. This
implies that the search problem of finding a maximum cut is \(\NP\)-hard.
So, rather than trying to find an exact solution to the search
problem, we design an approximation algorithm. We start by identifying
an appropriate benchmark.
Given a graph \(G = (V, E)\), we define the set \(I_v \subseteq E\) to be
the edges incident to vertex \(v \in V\):
\[I_v = \set{(u, v) \in E : u \in V} \; \text.\]
In other words, \(I_v\) is the set of edges that have \(v\) as an
endpoint. We claim the following:
Claim 172
Let \(S^*\) be a maximum cut in a graph \(G=(V,E)\). For every vertex
\(v \in V\), at least half the edges of \(I_v\) are in the crossing set
\(C(S^*)\).
Proof 173
We prove the contrapositive: if for some vertex \(v \in V\), fewer
than half the edges of \(I_v\) are in \(C(S^*)\), then \(S^*\) is not a
maximum cut. Without loss of generality, assume that \(v \in S^*\)
(otherwise, we can replace \(S^*\) with its complement). Then the
edges \(I_v\) can be partitioned into two subsets:
In words, \(I_{v,c} \subseteq C(S^*)\) consists of the edges incident
to \(v\) that cross the cut, while \(I_{v,nc}\) consists of those that
do not (both endpoints are in \(S^*\)). By assumption, \(\abs{I_{v,c}}
< \abs{I_{v,nc}}\).
If we remove \(v\) from \(S^*\), then the edges in \(I_{v,nc}\) become
crossing edges, while those in \(I_{v,c}\) are no longer crossing
edges, as illustrated below.
Thus, removing \(v\) from \(S^*\) increases the number of crossing
edges by \(\abs{I_{v,nc}} - \abs{I_{v,c}} > 0\), i.e., \(\abs{C(S^*
\setminus \set{v})} > \abs{C(S^*)}\). Because there is a cut with
more crossing edges than those of \(S^*\), we conclude that \(S^*\) is
not a maximum cut, as claimed.
Corollary 174
The size of any maximum cut \(S^*\) in a graph \(G = (V, E)\) is at
least \(\abs{E}/2\), i.e., \(\abs{C(S^*)} \geq \abs{E}/2\).
Proof 175
We first observe that
\[\abs{E} = \frac{1}{2} \sum_{v\in V} \abs{I_v}\]
since each edge appears in exactly two sets \(I_v\). Similarly,
We have proved a lower bound on the number of edges in any maximum
cut. A trivial upper bound is the total number of edges \(\abs{E}\),
because the crossing edges are a subset of \(E\). In summary, for any
maximum cut \(S^*\),
Having established bounds on the size of a maximum cut, we now design
an algorithm that outputs a cut of size at least \(\abs{E}/2\), which is
a 1/2-approximation of the optimal. We take inspiration from the
argument in Proof 173: the algorithm
repeatedly finds a vertex for which moving it to the opposite side of
the cut increases the cut size, until no such vertex exists. We call
this the local-search algorithm for max-cut.
\begin{algorithmic}
\INPUT an undirected graph
\OUTPUT a $1/2$-approximate maximum cut in the graph
\FUNCTION{LocalSearchCut}{$G=(V,E)$}
\STATE $S = \emptyset$
\REPEAT
\FORALL{$v \in S$}
\IF{$\abs{C(S \setminus \set{v})} > \abs{C(S)}$}
\STATE $S = S \setminus \set{v}$
\ENDIF
\ENDFOR
\FORALL{$u \notin S$}
\IF{$\abs{C(S \cup \set{u})} > \abs{C(S)}$}
\STATE $S = S \cup \set{v}$
\ENDIF
\ENDFOR
\UNTIL{no change was made to $S$ in the most recent iteration}
\RETURN $S$
\ENDFUNCTION
\end{algorithmic}
The algorithm halts when there is no vertex for which moving it to the
opposite side of the cut \(S\) increases the cut size. By the reasoning
in Proof 173, this means that for
every vertex \(v\), at least half the edges in \(I_v\) are crossing edges
of \(S\). Thus, \(C(S)\) has at least half the edges in \(E\), i.e.,
\(\abs{C(S)} \geq \abs{E}/2 \geq \abs{C(S^*)}/2\) for any cut \(S^*\).
Thus, \(S\) is a 1/2-approximation to a maximum cut in the graph.
Since the algorithm makes at least one change in each iteration
(except the last one) of the outer loop, and each move increases the
size of the cut, we can argue by the potential method that the algorithm halts within \(\abs{E}+1\)
iterations (because \(S\) starts with zero crossing edges, and can have
no more than \(\abs{E}\) crossing edges). Each iteration loops over all
the vertices and tests whether moving each one to the other side of
the cut increases the cut size, which can be done efficiently.
The following illustrates the execution of the local-search algorithm
on a sample graph:
Initially, all of the vertices are in \(\overline{S}\), illustrated
above as solid circles. The algorithm chooses the bottom-right vertex
to move to the other side of the cut, since doing so increases the cut
size from zero to three. We represent its membership in \(S\) by
depicting the vertex as an open circle. Then, the algorithm happens to
choose the vertex at the middle left, because doing so increases the
cut size to five. At this point, there is no single vertex that can be
moved to the other side of the cut to increase the cut size, so the
algorithm terminates. (Incidentally, the maximum cut for this graph
has six edges; can you find it?)
The knapsack problem involves selecting a subset of items that
maximizes their total value, subject to a weight constraint. More
specifically, we have \(n\) items, and each item \(i\) has a value \(v_i\)
and a weight \(w_i\). We have a weight capacity \(C\), and we wish to find
a subset \(S\) of the items that maximizes the total value
\[\val(S) = \sum_{i\in S} v_i\]
without exceeding the weight limit:
\[\sum_{i\in S} w_i \le C\]
In this problem formulation, we are not permitted to select the same
item more than once; for each item, we either select it to be in \(S\)
or we do not. Thus, this formulation is also known as the 0-1
knapsack problem. For simplicity, we assume that each item’s weight
is no more than \(C\) – otherwise, we can immediately remove this item
from consideration.
A natural dynamic-programming algorithm to compute an optimal set of
items does \(O(nC)\) operations, given \(n\) items and a weight limit \(C\).
Assuming that the weights and values of each item are numbers that are
similar in size to (or smaller than) \(C\), the total input size is \(O(n
\log C + \log C) = O(n \log C)\), with all numbers encoded in a
non-unary base. [20] Thus, the dynamic-programming solution is not
efficient with respect to the input size. In fact, this problem is
known to be \(\NP\)-hard, so we are unlikely to find an efficient solution
to solve it exactly.
Instead, we attempt to approximate the optimal solution to knapsack
with a greedy algorithm. Our first attempt is the relatively greedy
algorithm, in which we select items by decreasing relative value,
until we cannot take any more. In other words, we choose items in
decreasing order by the ratio \(v_i/w_i\).
\begin{algorithmic}
\INPUT arrays of item values and weights, and a knapsack capacity
\OUTPUT a valid selection of items
\FUNCTION{RelativelyGreedy}{$V[1,\ldots,n],W[1,\ldots,n],C$}
\STATE let $D[1,\ldots,n]$ be indices $1,\ldots,n$, sorted so
that $V[D[i]]/W[D[i]]$ are non-increasing
\STATE $S = \emptyset$, $\weight = 0$
\FOR{$i=1$ to $n$}
\IF{$\weight + W[D[i]] \leq C$}
\STATE $S = S \cup \set{D[i]}$
\STATE $\weight = \weight + W[D[i]]$
\ENDIF
\ENDFOR
\STATE \RETURN $S$
\ENDFUNCTION
\end{algorithmic}
This algorithm runs in \(\O(n \log n \log C)\) time, where \(n =
\abs{V}\); the time is dominated by sorting, which does \(\O(n \log n)\)
comparisons (e.g. using merge sort), and each
comparison takes \(\O(\log C)\) time for numbers that are similar in
size to \(C\). Thus, this algorithm is efficient with respect to the
input size.
Unfortunately, the relatively greedy algorithm is not guaranteed to
achieve any approximation ratio \(\alpha\). Consider what happens when
we have two items, the first with value \(v_1 = 1\) and weight \(w_1 =
1\), and the second with value \(v_2 = 100\) and weight \(w_2 = 200\). Then
item \(1\) has a value-to-weight ratio of \(1\), while item \(2\) has a
value-to-weight ratio of \(0.5\). This means that the relatively greedy
algorithm will always try to pick item \(1\) before item \(2\). If the
weight capacity is \(C = 200\), the algorithm produces \(S = \{1\}\), with
a total value of \(\val(S) = 1\). However, the optimal solution is \(S^*
= \{2\}\), with a total value of \(\val(S^*) = 100\). This gives us an
approximation ratio of \(0.01\), but we can easily find other examples
with arbitrarily smaller ratios.
The main problem in the example above is that the relatively greedy
algorithm ignores the most valuable item in favor of one with a better
value-to-weight ratio. A “single-greedy” algorithm that takes just
one item that has the largest value actually produces an optimal
result for the above example. This algorithm is as follows:
\begin{algorithmic}
\FUNCTION{SingleGreedy}{$V,W$}
\RETURN an index $i$ that maximizes $V[i]$
\ENDFUNCTION
\end{algorithmic}
This algorithm runs in time \(\O(n \log C)\), because it make a single
pass over the array of values, keeping track of the largest value (and
its index) found so far. While it achieves the optimal result for our
previous example, it is straightforward to construct examples where
the single-greedy algorithm fails to achieve any approximation ratio
\(\alpha\). Suppose we have \(201\) items, where all but the last item
have value \(1\) and weight \(1\), while the last item has value \(2\) and
weight \(200\). In other words, we have
With a weight limit \(C = 200\), the optimal solution is to pick items
\(1, \dots, 200\) for a total value of \(200\), but the single-greedy
algorithm picks the single item \(201\), for a value of \(2\). Thus, it
achieves a ratio of just \(0.01\) on this instance of the problem.
Moreover, the example can be modified to make the ratio arbitrarily
small.
Observe that for this second example, the relatively greedy algorithm
actually would produce the optimal solution. It seems like the
single-greedy algorithm does well where the relatively greedy
algorithm fails, and the relatively greedy algorithm performs well
when the single-greedy algorithm does not. This motivates us to
construct a combined-greedy algorithm that runs both the relatively
greedy and single-greedy algorithm and picks the better of the two
solutions. It can be shown that the combined-greedy algorithm
1/2-approximates the optimal solution for any instance of the knapsack
problem.
In this exercise, we will prove that the combined-greedy algorithm
is a 1/2-approximation algorithm for the 0-1 knapsack problem.
Define the fractional knapsack problem as a variant that
allows taking any partial amount of each item. (For example, we
can take \(3/7\) of item \(i\), for a value of \((3/7)v_i\), at a
weight of \((3/7)w_i\).) Prove that for this fractional variant,
the optimal value is no smaller than for the original 0-1
variant (for the same weights, values, and knapsack capacity).
Prove that the sum of the values obtained by the relatively
greedy and single-greedy algorithm is at least the optimal value
for the fractional knapsack problem.
Using the previous parts, prove that the combined-greedy
algorithm is a 1/2-approximation algorithm for the 0-1 knapsack
problem.
While we do not know how to efficiently find exact solutions to
\(\NP\)-hard problems, we can apply heuristics to these problems. There
are two general kinds of heuristics:
Those that are guaranteed to get the right answer, but may take a
long time on some inputs. However, they may be efficient enough for
inputs that we care about. Examples include the dynamic-programming
algorithm for 0-1 knapsack where the weight capacity \(C\) is small
compared to the number of items \(n\), as well as SAT solvers for
Boolean formulas that emerge from software verification.
Those that are guaranteed to run fast, but may produce an incorrect
or suboptimal answer. The α-approximation algorithms we considered
above are examples of this, as well as greedy algorithms for the
traveling salesperson problem, which has no α-approximation
algorithm for general instances of the problem.
Some algorithms are only applicable to special cases of a problem. For
instance, there are efficient algorithms for finding a maximum cut on
planar graphs, which can be depicted on a plane without any crossing
edges.
Another general approach is randomized algorithms, which may be able
to find good solutions with high probability. We will consider such
algorithms in the next unit.