What You Should Retain About the Theory of Computation
The theory course in computer science is often the one cited by students as the one they like the least and dread the most. Part of the reason why is that instructors often get buried in the details of the course (for example, the pumping lemma for context free languages) and neglect the big picture of theory: why it is important for the general computer science student, what it adds to the worth of a computer science bachelor's degree (as opposed to a two-year technical degree in programming or a four year degree in some related area, such as management information systems, or web-centric computing, etc.), and how it might apply to the practice of programming.
This section is intended to summarize the theory course at the end to remind us all what of these issues. There are many aspects of the theory course that don't seem relevant to the practice of programming, and, indeed many of them are not directly relevant. However many are, and the ones that aren't actually have quite a bit of indirect relevance to the practice.
Here we look briefly at the main things you should retain over your careers about the theory of computing that will help you with the practice of computing, with reading and understanding issues with regard to computing, and even help you understand and perhaps contribute to local, national, or international policies regarding computing.
In general, there are eight major aspects of the theory you should retain;
- a definitional framework for computer science and the theory of computation by way of problems and their solutions
- what the theory of computation is, based on these definitions
- the notion of an algorithm as the solution to a problem
- what the theoretical limits of algorithms are
- what the practical limits of algorithms are
- context-free languages and pushdown automata as a simpler, yet very important, class of problems that occur in the everyday life of a computer scientist
- regular languages, regular expressions, and finite state automata as a yet simpler, yet also very important, class of problems that occur in the everyday life of a computer scientist.
- the role of theoretical definitions, lemmas, theorems, and proofs in the shaping of the well-rounded computer scientist.
1. Computation
A Definition
Definition.
Computation (computer science) is the study of problems and their solutions as algorithms.
Problems and Solutions
- a problem must be well defined so that any reasonable person would be able to understand the problem and recognize a solution to the problem when one is presented;
- a problem must have an infinite number of instances, otherwise the problem would not need an algorithm, because the answers to the finite number of instances could be calculated once, stored in a table, and the answers to the instances could from thenceforth simply be looked up in the table (for practical computing, we accept "a large number of instances" as well, because if the finite number of instances is very large, keeping a table around would be impractical);
- an instance of a problem is any valid input data for the problem. That is, it is a string in a specified format that represents valid input data for an algorithm that solves the problem;
- a procedure purported to solve a problem is an algorithm (i.e., a solution) for the problem if and only if it takes any valid instance as input, processes that input, and halts after a finite number of processing steps with the correct answer for that instance.
The Distinction Between Problems and Algorithms
It is important to carefully note the distinction between "problem" and "algorithm." As with any programming assignment, different people will present different programs (algorithms) as solutions to the problem. The two programs may both solve the problem but may have different time complexities; one may have worst case time complexity O(n2) and the other may have worst case time complexity O(n log n). Theoreticians may have shown, however, that the inherent time complexity of the problem (as opposed to the time complexities of various algorithms that solve it) is O(n) in the worst case. That just means that it has been proven that there are algorithms whose worst case running time is O(n) that solve this problem.
For example, the problem may be to compute the smallest value in an unordered list. Programmer 1 may develop a program that sorts the list into ascending order by using a bubble sort algorithm and then list the first element as the output of the program. Programmer 2 may develop a program that sorts the list into descending order using a merge sort algorithm followed by printing the last element of the sorted list as the output of the program. The first program would have O(n2) worst case time complexity and the second O(n log n) worst case time complexity. However it is known that there are algorithms that solve this problem that have O(n) worst case time complexity, but none that run with a worst case time complexity less than O(n).
Thus, the problem is categorized as being in the class of programs with O(n) worst case time complexity.
What You Should Retain
You should know by heart
- the definition of computation (computer science)
and the meaning of the terms:
- problem (e.g., determine the shortest path through a directed, weighted, graph from a starting vertex to an ending vertex)
- problem instance (e.g., a directed, weighted graph given in adjacency matrix form along with a starting vertex and an ending vertex)
- algorithm (e.g., a solution that takes any directed, weighted graph along with a starting vertex and an ending vertex as input and produces the shortest path through the graph as output).
and the difference between the terms:
- problem
- solution
2. The Theory of Computation
Given the definition of computation above, the theory of computation is the body of knowledge developed about problems and the algorithms that solve them.
The Big Questions
The theory of computation thus answers (or attempts to answer) such questions as
- what constitutes a solution, or algorithm, for a problem
- whether there are unsolvable problems (problems that can be formulated so that they are readily understandable and classify as problems according to the definition above, but for which no algorithms exists that solve them)
- what the inherent time complexity of given problem is (i.e., the best worst case time complexity over all algorithms that solve the given problem)
- what the time complexity of a given algorithm is (i.e., the worst case time complexity of the given algorithm
Carving up the Universe of Problems
With this knowledge, the theory of computation categorizes the universe of all problems in two general ways:
- Those that are solvable (have algorithms) and those that are not (have no algorithms)
- Those that are solvable but intractable (i.e., problems that have algorithms, but for which none of the algorithms have a time complexity that is of small enough order to warrant actually using any of the algorithms as a practical solution to the problem, because the time to obtain an answer by way of the algorithm for larger and larger instances of the problem is prohibitive) ant those that are tractable (have algorithms that have a time complexity of a small enough order to warrant using such an algorithm as a practical solution to the problem).
What You Should Retain
You should know by heart
- The two main ways that the theory has categorized problems (solvable vs unsolvable, and tractable vs intractable)
- What the terms solvable, unsolvable, tractable, and intractable mean
- The difference between the time complexity of a problem and the time complexity of an algorithm
3. Algorithms and Turing Machines
In order to make the study of problems and their solutions as algorithms as easy as possible, a formal definition of "algorithm" is needed. The one usually chosen for this definition is the Turing Machine.
History
Until the 1930s, mathematicians and scientists generally believed that all problems were solvable. The problem was, there was no formal definition of what it meant for a problem to be solvable, which made discussion of the whole concept impossible. In that decade a number of things happened to help formalize the idea of "solvability."
Mathematician Kurt Godel proved that in any mathematical system true statements could be posed that could not be proved in that system (i.e., showing that there were true theorems that could not be proved). This was a rather earth-shaking finding that began to make the foundational belief in the invincibility of mathematics crumble.
Mathematician Alan Turing defined a simple machine model, called the Turing machine, and postulated that the model represented what we intuitively understand to be "computable." That is, if a Turing machine could be constructed for a problem that halted on all of its inputs with the correct answer to the input problem on its tape, the problem was solvable. If no such Turing machine could be constructed, the problem was unsolvable. It could readily be recognized (by most) that Turing machines were simple enough (an mechanical to boot) that a Turing machine did indeed constitute a solution to a problem. The question was whether it captured all possible solutions to problems.
Logician Alonzo Church described a way of calculating (a calculus) based on simple base functions upon which more complex functions could be constructed as a way of computing. It became known as the λ-calculus. It was shown that the Turing machine and the λ-calculus solved the same set of problems (i.e., had the same expressive power).
The Church-Turing Thesis
The Church-Turing Thesis essentially states that the Turing machine and all of the models of computation that have shown to be equivalent to the Turing machine represent what we intuitively understand by the notion of "algorithm".
The thesis is believable, because many have attempted to find more powerful models of computation only to be thwarted:
- Turing machines have been extended in numerous ways through the addition of multiple tapes, multi-dimensional tapes. nondeterminism, and others. All have been shown to be equivalent to the original in computational power.
- Many different models of computation have been devised, including the λ-calculus, partial recursive functions, random access machine models, modern programming languages, and so forth. All have been shown to be equivalent to the Turing machine in power.
Thus, the theoretical implication of the Church-Turing thesis is that no model of computing will extend the class of solvable problems beyond that defined by the Turing machine and equivalent models
The practical implication of the Church-Turing thesis for practicing programmers is that programs written in any comprehensive programming language are equivalent to Turing machines: they both solve the same class of problems. Thus, if a program can be written in any of these programming languages to solve a particular problem it is (of course) solvable. Also, if no program can be written in any of these programming languages to solve a particular problem, it is not solvable by any other model of computation either.
What you should Retain
- the basic history of computation
- that the Turing machine was devised by Alan Turing as a simple model of what we now generally refer to as "algorithm," or "solution to a problem"
- what the Church-Turing thesis means to the practicing programmer
- why the Church-Turing thesis is believable
- that programming languages have ultimate computational power equivalent to a Turing machine
4. The Theoretical Limits of Computation
A Theoretical View Based on Turing Machines
Once the Turing machine was accepted as a complete model of computation, theoreticians could begin finally to ask and answer the one big question:
Are there any problems that are not Turing computable?
This is the same as asking whether there are any problems for which no Turing machine can be constructed as a solution.
The answer to the question is, "Yes."
The most famous example of an unsolvable problem is the halting problem: Can a Turing machine H be constructed that takes as input a description M of an arbitrary Turing machine and a description w that is an valid instance of problem for M, such that H can determine whether M would halt while processing M or not? Turing machine H does not exist.
A Practical View Based on Programming Languages
There are many practical problems for which no program can be written as a solution. For example, just as for Turing machines (no surprise, since both have the same computational power) it is impossible to write a program that determines, given an arbitrary program with data as input, whether that input program would halt while processing its data.
This means that one could never write a student program loop checker that checks whether student programs would go into an infinite loop and not allow them to run if so. This is an example of an easy-to-understand, interesting problem that has no solution. It also means that we cannot write a universal debugger or a compiler that catches all possible run time errors in programs that it compiles.
What You Should Retain
- There are an infinite number of problems that have no Turing machine solution (a Turing machine that accepts any instance of such a problem as input and always halts in either a qaccept or qreject state).
- Some of these problems are easy to understand and ones we really would like to have solutions for.
- It all boils down to this fact: No algorithm (Turing machine) can be written that determines what an arbitrary input Turing machine does or does not compute (a result known as Rice's Theorem).
- An idea of the kinds of problems that are likely unsolvable, including problems that require a determination of what arbitrary programs compute (Do they have infinite loops? Will they ever reach this point in the program while running? And so forth.).
- An understanding that you don't just give up in the face of an unsolvable problem. It is possible to determine some things about an arbitrary program under some situations. It is just that you cannot solve the problem in general. It may also be possible, for example, to reformulate the problem in order to make it workable, such as "a possible infinite loop has been detected here..." warning might be considered valuable even if your program cannot uncover all such possibilities for all possible programs.
5. The Practical Limits of Computation
The fact that a problem has a solution (program), does not mean that the solution is practical. We are aware that the time complexity of a program can be so high as to make the program useless as a practical solution even if it provably solves the problem. That is, if a problem has no efficient solution, the problem is beyond the limits of practical computing even if it is solvable.
In order to make some general judgments about problems that have practical solutions and those that do not, computer scientists have carved the world of problems up according to their inherent time complexities. This carving has resulted in four subclasses of all solvable problems that are of interest to us as practical programmers and to theoreticians:
- P - The set of all problems that have programs with polynomial worst case time complexity (i.e., O(nk) for some positive integer k), is declared to be the set of tractable, or practically solvable, problems.
- P complement - The set of all problems that do not have an O(nk) solution for any positive integer k is declared to be the set of intractable, or practically unsolvable, problems. For example, problems whose inherent time complexity is O(2n) or worse are said to be intractable.
- NP - The set of all problems for which there is a nondeterministic Turing machine that solves them with worst case time complexity that is a polynomial. Thus NP is a superset of P, although it is not known whether it is a proper superset.
- NP - complete - The set of all of the "hardest" problems in NP for which, if it could be shown that just one of these problems is also in P, the result would be a proof that P = NP.
The only two classes that are of importance to practical programming are P and P complement: Either a problem has a polynomial time bounded program that solves it or it doesn't. That means it either has a practical solution or it doesn't.
However, the class of problems labeled NP continues to bedevil theoreticians. It contains all of the problems in P and it also appears to contain some problems in P complement as well, but no one has yet been able to prove or disprove this fact.
In an attempt to try to answer the question of whether P = NP, theoreticians discovered a subset of the NP problems which have been labeled NP-complete problems. These are the "hardest" (in terms of their time complexities) problems in NP. They do have O(2n) solutions and thus appear to be in P complement, but no one has been able to prove that they do not have O(nk) solutions. They are called "complete" for the class of NP problems in the sense that if one could find a polynomial time solution for just one of these identified hardest problems it would prove that P = NP, since there are no harder problems in NP.
One famous example of an NP-complete problem is the traveling salesman problem (TSP): Given an input string that is the encoding of a map with cities and highways with distances between connected cities, determine whether there is a tour that starts in a certain city, winds up back at that same city, and passes through all of the other cities exactly once within some specified distance bound B. This problem is in NP, because a nondeterministic Turing machine can guess a permutation of the cities that starts and ends with the same city, and then tally up the distances between this ordering of the cities to compare with the bound B. If the tour is less than or equal to the bound B, the nondeterministic Turing machine accepts the input; otherwise it rejects it. It can easily do this within a polynomial number of steps in the size of the input string. So this problem, TSP, is in NP. It does not appear to be in P, because many people have tried to find a polynomial time algorithm for it without success. It appears that the only way to solve this problem deterministically is essentially to simulate the nondeterministic solution by generating different orderings of the cities and trying them one after the other (a brute-force approach). TSP is also NP-complete, because it has been shown that all problems in NP can be reduced to TSP in polynomial time. That is, there exists an algorithm that runs in polynomial time (deterministically) that will take the input instances for any other problem in NP, convert those instances to input instances of TSP, such that a TSP algorithm will accept the converted string if and only if the original string is in the language of the original problem. Thus, a deterministic polynomial time solution to TSP implies a deterministic polynomial time solution to any problem in NP. Just convert each incoming instance of the original problem to an equivalent instance of the TSP problem in polynomial time, then run the purported deterministic polynomial time algorithm for TSP on the result. The entire process would thus take only polynomial time.
To show that any other problem is NP-complete, a theoretician would just need to show that there is a polynomial time algorithm that reduces a known NP-complete problem to the new problem. This algorithm would have to take any instance of the known NP-complete problem and convert that instance to an equivalent instance for the new problem such that an algorithm for the new problem would accept the converted instance if and only if an algorithm for the original problem would accept the original, unconverted instance. This means that the new problem is at least as hard as the original problem in that if you could find a polynomial time solution to it, you would have a polynomial time solution to the original problem by way of the reduction.
A vast number of problems have been shown to be NP-complete. None has been shown to be in P. No problem in NP, including NP-complete problems, have been proved to not also be in P. That is the conundrum.
What You Should Retain
As a practicing programmer you will always be interested in finding the most efficient solutions to the problems you tackle. This will mean that you must at worst, you must find a solution that has polynomial time complexity. You will also likely run into problems that have no practical (i.e., worst case polynomial time) solution. Things to retain are these:
- you need to be on the watch for problems that appear to be NP-complete. Hints that you might be onto such a problem include that it appears that the only way to solve the problem is to try all of an exponential number of possibilities to determine if one of them is an answer to a given instance. A web search might verify your suspicions, because other people may have already thought about this problem.
- If a problem is NP-complete, even though it has not been proven that there is no polynomial time solution to the problem, you as a practicing programmer will treat this problem as intractable, because you would be wasting your time to find a more efficient solution.
- If a problem is intractable (you know that you cannot find a polynomial time solution for it), you don't give up. You can
- work with your supervisor to try to recast the problem into a simpler problem whose solution would still meet the needs of the supervisor
- come up with (e.g., on your own or by a literature and web search) an approximation algorithm that does run in polynomial time that will provide an answers that aren't optimal for every input instance, but are useable. For example, one may not be able to find the shortest tour for the traveling salesman with a polynomial time algorithm for every input instance, but may be able to come up with an answer that is not far off the optimal in most cases.
- analyze the program with expected input sets. In some cases the average case time complexity of a program might be polynomial even thought the worst case is known to be exponential.
In addition, you should remember by heart what these terms mean:
- P
- NP
- NP-complete
- tractable
- intractable
and be able to provide examples for each term.
6. Context Free Languages, Context Free Grammars, and Pushdown Automata
There are lots of problem classes that clearly have efficient solutions, and there are less powerful models of computation (than Turing machines) that solve such problems. Some of these classes are important and useful enough that they have an entire body of theory devoted just to them. The context-free languages are one such class.
What You Should Retain
Among the things you should retain are
- the form of a context free grammar.
- the form of a nondeterministic pushdown automaton
- the fact that context free grammars generate exactly the set of languages that nondeterministic pushdown automata can recognize (the context free languages)
- deterministic pushdown automata can only recognize a subset of the context free languages (i.e., nondeterministic pushdown automata are more powerful than deterministic pushdown automata).
- ways to determine intuitively whether a language is context free or not by considering how it could be recognized by a pushdown automaton.
You should also retain some examples of the way context-free languages and grammars are used in the everyday life of a computer scientist.
- context free grammars are used to specify many important objects in computer science, including
- programming languages
- web XML documents, by way of the data type definition (DTD) that accompanies XML files
- some aspects of query languages
- context free grammars are equivalent to EBNF, the scheme most often used to specify the syntax of a programming language.
7. Regular Languages, Regular Expressions, Regular Grammars, and Finite State Automata
Finite state automata represent the simplest model of computation normally studied. Everyone would agree that an FSA represents and algorithm. It is also clear that not every algorithm can be represented by an FSA because it is woefully less powerful than a Turing machine. The only difference between them, though, is that the Turing machine can read and write on its tape while moving left and right (and, of course, can have as much tape as necessary).
Nonetheless, finite state automata and their equivalent formulations as regular expressions, are incredibly useful. Thus, it is a good idea for all computer scientists to have a grasp of these concepts.
What You Should Retain
Among the things you should retain are
- the form of a finite state automaton as a recognizer of a regular language
- the makeup of regular expressions as a shorthand for the pattern that all strings in a regular expression match
- the form of regular grammars as generators of regular languages
- the fact that all three of these are equivalent representations of regular languages
- examples of regular languages
- examples of languages that are not regular
- ways to intuitively determine whether a language is regular or not based on how a finite state automaton would or would not be able to recognize the language.
In addition, you should retain in mind some examples of where these concepts show up in the everyday life of a programmer.
- many programming languages, especially those designed to perform operations on strings, allow the programmer to define regular expressions for matching patterns within strings. Perl and javascript are just two of these.
- nearly all search tools made available by operating systems to search for files in a file system allow the user to specify a regular expression to help with the search
- the scanner portion of a compiler that turns a file of individual characters into the component parts (tokens) to be used by the parser is based entirely on regular expressions.
8. Definitions, Theorems, and Proof
Of course, any theory is full of formal definitions, theorems, lemmas, and proofs. Should you retain these? Not really. In fact, you probably would have a difficult time doing this. You DO need to learn these things once for many reasons, but they do not represent the kind of knowledge you need to retain to function well in any career...unless your career is to be a theoretician.
What You Should Retain
So what should you retain from the the formal definitions, theorems, and proofs of the theory of computation?
- An appreciation for the work that theoreticians have done so that we actually do know with mathematical surety that certain things about the field we practice in every day do and do not hold.
- A general intuitive understanding of the theory so that if questions do come up in practice about the theory we know where to look for answers
- An ability to read and understand formal descriptions of concepts. These will show up in practice at all levels, from specifications for a program to be written to document type definitions for XML files, and on, and on.
- A confidence that you can reason about problems you face and recognize when you do or don't understand them well enough to solve them.
To give you an example, except for some references to them in the compiler class, you will most likely never again need to know or be asked to apply the pumping lemma for context free languages. So you can safely forget its details. However, having worked through this lemma and applied it, you have gained some of the reasoning skills that will help you deal with formal definitions and what they imply when you run into formal specifications of any kind. So even indirectly, learning these concepts will help you. And if you ever do need to know more about them, as we said, you will know where to look them up.