3 Turing Machines I
3.1 A machine that computes
Your intuition might be that a computer is a small box that sits on a desk, in a server rack, and even in your pocket. Making use of electronics, digital circuits, and millions of bits and bytes to perform what is sufficiently indistinguishable from magic 1.
In 1936, they had a very different picture. The closest equivalent were mechanical devices that used levers, linkages, and gears to produce work - similar to a train. Instead of work being motion, it would be the evaluation a series of steps, generally finding a solution to some form of arithmetic. Mechanical calculators were probably the most well known instances of a “computer”.
The first general purpose computer was actually proposed about 100 years before Turing in 1837. Charles Babbage designed a machine he called the “Analytical Engine”.
This machine was a digital-mechanical hybrid that made use of components that would not be out of place in a contemporary digital computer. Had it been finished, it would have been the first known instance of device that is Turing Complete 2.
Ada Lovelace is often called the first programmer because she wrote the first published progam for this device. 3 This program, called Note G, calculated a sequence of Bernoulli Numbers.
When we build our mathematical model of a computer and algorithm, I want you to keep in mind the ‘mechanical’ nature of it.
For our model to be correct, there are important ideas that need to be captured:
- It should be as simple as possible
- What is the smallest amount of ‘work’ we can get away with for each step in our device?
- We will use a string as the only data structure
- It should be able to do everything an algorithm can do - For loops, if statements, variables etc. should all be implementable on the device
3.2 Turing Machines
3.2.1 Some intuition
Imagine a physical device that has three components:
- A space to read and write symbols - a tape of sequential cells
- A mechanism to perform the read and write - a head that can read or write on a given cell on our tape
- A control unit to tell the the head what symbol it should be writing - a box that keeps track of our algorithm steps and where along that algorithm where are
- On our tape, we have a set of symbols on every cell - even if that symbol is a “blank” (\(\sqcup\)) symbol to represent nothing (\(\sqcup, 0, 1, 2, ...\)).
- We will also assume that our tape has as much space as it needs - i.e. an infinite tape.
- Our head mechanism, can move between the different cells going either left or right along it.
- The control unit is composed of various ‘device states’ (\(q_1, q_2, q_3, ...\)) that holds the information about whats happened so far (Gears in different positions, or electronic signals in specific patterns).
This device has only one instruction: A conditional statement based off the current device state \(q_i\), and the symbol currently on the tape \(s_n\) that tells us what state to go to next \(q_j\), what symbol should be written on the tape \(s_m\), and whether we should go left or right \(D \in \{ L, R \}\).
\[ q_i, s_n \rightarrow q_j, s_m, D \]
If we tried to build this device in the real world, these instructions are just the series of levers and gears that interact to create our desired effect.
- Write this symbol - rotate a ball with all the symbols on it
- Change to the 5th state - rotate this gear by 5 teeth from this notch
- Go left - rotate this other gear counter-clockwise
We will add the stipulation that if our machine enters specific states, the computation will stop: For a Yes / No (Accept / Reject) machine we will have the special states \(q_\text{accept}, q_\text{reject}\) indicating the corresponding answer.
Additionally, we need to start somewhere, so we designate \(q_0\) as the ‘start state’. If we need to run this device, we must first reset it so that we are in \(q_0\).
Our programming language for this device will be all the possible permutations of these instructions that direct the device:
- In \(q_0\),
- reading \(\sqcup\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
- reading \(0\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
- reading \(1\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
- In \(q_1\),
- reading \(\sqcup\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
- reading \(0\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
- reading \(1\), go to \(\{ q_0, q_1, q_\text{halt} \}\), writing symbol \(\{ \sqcup, 0, 1 \}\) and moving \(\{ L, R \}\).
Our program \(\delta\) is the subset of of these permutations that describe the particular behaviour we want.
- In \(q_0\)
- reading \(\sqcup\), go to \(q_1\), writing \(\sqcup\) and moving \(L\)
- reading \(0\), go to \(q_0\), writing \(0\) and moving \(R\)
- reading \(1\), go to \(q_0\), writing \(1\) and moving \(R\)
- In \(q_1\)
- reading \(\sqcup\), go to \(q_\text{reject}\), writing \(\sqcup\) and moving \(R\)
- reading \(0\), go to \(q_\text{reject}\), writing \(0\) and moving \(R\)
- reading \(1\), go to \(q_\text{accept}\), writing \(1\) and moving \(R\)
What do these instructions actually do? Try describe the above algorithm in english.
If we use the binary repesentation of integer’s, \(\left< z \right>_{\mathbb{Z}}\), what is happening to the number \(z \in \mathbb{Z}\)?
And a computation is the series of changes that has occured in our device and tape.
- Set the tape to be the input string \(0111\),
- Set the device to be in \(q_0\) and the head to be at the first symbol
- We always start at the first symbol of our input
- Follow each instruction until \(q_\text{accept}\) or \(q_\text{reject}\)
- Step \(0\): In \(q_0\), scanning cell \(0\), reading symbol \(0\), go to \(q_0\), write \(0\), move right to cell \(1\)
- \(q_0 0111\)
- Step \(1\): In \(q_0\), scanning cell \(1\), reading symbol \(1\), go to \(q_0\), write \(1\), move right to cell \(2\)
- \(0 q_0 111\)
- Step \(2\): In \(q_0\), scanning cell \(2\), reading symbol \(1\), go to \(q_0\), write \(1\), move right to cell \(3\)
- \(01 q_0 11\)
- Step \(3\): In \(q_0\), scanning cell \(3\), reading symbol \(1\), go to \(q_0\), write \(1\), move right to cell \(4\)
- \(011 q_0 1\)
- Step \(4\): In \(q_0\), scanning cell \(4\), reading symbol \(\sqcup\), go to \(q_1\), write \(\sqcup\), move left to cell \(3\)
- We have moved off the end of our string into a “blank” cell
- \(0111 q_0 \sqcup\)
- Step \(5\): In \(q_1\), scanning cell \(3\), reading symbol \(1\), go to \(q_\text{accept}\), write \(1\), move right to cell \(4\)
- \(011 q_1 1\)
- Step \(6\): In \(q_\text{accept}\) scanning cell \(4\) \(\rightarrow\) halt
- \(0111 q_\text{accept} \sqcup\)
- Step \(0\): In \(q_0\), scanning cell \(0\), reading symbol \(0\), go to \(q_0\), write \(0\), move right to cell \(1\)
- Read the result as \(1\) as we are in \(q_\text{accept}\)
Congratulations, we have just built a physical device that evaluates some algorithm. If we use \(\left< z \right>_{\mathbb{Z}}\), then this algorithm determines if the given integer is even or odd.
3.2.2 Formal Definition
The previous device is essentially an implementation of a Turing Machine as a mechanical machine.
Mathematically, a Turing Machine for Decision Problems are defined as follows
A Decision Turing Machine \(M\) is the tuple \((Q, \Gamma, \Delta, \delta, q_\text{init}, q_\text{accept}, q_\text{reject})\) where
- \(Q\) is a finite non-empty set of states
- \(\Gamma\) is an input alphabet such that \(\sqcup \notin \Gamma\)
- \(\Delta\) is a tape alphabet, satisfying the following conditions:
- \(\Gamma \subseteq \Delta\)
- \(\sqcup \in \Delta\)
- \(Q \cap \Gamma = \emptyset\)
- \(q_\text{init}, q_\text{accept}, q_\text{reject} \in Q\)
- \(\delta\) (called the transition function or program) is a finite set of instructions in the form \(q_i, s_n \rightarrow q_j, s_m, D\)
- \(q_i, q_j \in Q\)
- \(s_n, s_m \in \Gamma\)
- \(D \in \{ L, R \}\)
- and for all \(q \in Q \backslash \{q_\text{accept}, q_\text{reject}\}\) and all \(s \in \Delta\) there exists only one instruction in \(\delta\) that has the form \(q, s \rightarrow q^\prime, s^\prime, D\)
We have seen previously that Decision Problems and Computational Problems are equivalent if an algorithm exists for one, then an algorithm must exist for the other. Try to rewrite this proof but using a Turing Machine (See Computation Problems for a definition of Computational Turing Machines) instead of a generic algorithm.
Decision and Computation equivalence means that the exact specifics of our Turing Machine are not that important. We stick to a Decision Turing Machine because it is a stronger continuation from Regular Languages and Context-Free Grammers. However, there is no reason you cannot tackle this entire branch using Computational problems instead.
3.2.3 Computing
A configuration is the current state of our Turing Machine, where the head is positioned, and what symbols are on the tape.
Let \(M\) be a Turing Machine,
- A configuration of \(M\) and tape is a word \(w\) in the alphabet \(Q \cup \Delta\) satisfying the following:
- \(w\) contains exactly one symbol from \(Q\)
- \(w\) does not begin with \(\sqcup\)
- If the last symbol of \(w\) is a \(\sqcup\), the it is preceded by a symbol from \(Q\)
- A configuration is halting if it contains \(q_\text{accept}\), \(q_\text{reject}\)
- A configuration is accepting if it contains \(q_\text{accept}\)
- A configuration is rejecting if it contains \(q_\text{reject}\)
Our machine starts in \(q_\text{init}\) and has the input string written on its tape
Let \(M\) be a Turing Machine, and \(x \in \Gamma^\ast\),
The initial configuration of \(M\) on \(x\) is \[ q_\text{init}x \]
Computation is the process of moving between different configurations of our machine and tape. In fact, a computation is the sequence of configurations start from the initial configuration until it is in a halting configuration.
Let \(M\) be a Turing Machine, and \(w\) and \(v\) are configurations of \(M\),
We say that \(w\) yields to \(v\) (\(w \vdash v\)) if one of the following conditions hold:
- \(q, s \rightarrow, q^\prime, s^\prime, L \in \delta\) and
- either \(w = xtqsy\) and \(v = xq^\prime t s^\prime y\) for \(t \in \Delta\) and \(x, y \in \Delta^\ast\)
- or \(w = qsx\) and \(v = q^\prime\sqcup s^\prime x\) for \(x \in \Delta*\)
- \(q, s \rightarrow, q^\prime, s^\prime, R \in \delta\) and
- either \(w = xqsy\) and \(v = x s^\prime q^\prime y\) for \(x, y \in \Delta^\ast\)
- or \(w = xqs\) and \(v = x s^\prime q^\prime \sqcup\) for \(x \in \Delta*\)
Our example in 3.2.1 produces the sequence of configurations \[ q_0 0111 \vdash 0 q_1 111 \vdash 01 q_1 11 \vdash 011 q_1 1 \vdash 0111 q_1 \sqcup \vdash 0111\sqcup q_\text{halt} \sqcup \]
Let \(M\) be a Turing Machine and \(x \in \Gamma^\ast\),
The computation of \(M\) on \(x\) is the sequence of configurations \(c_0, c_1, c_2, ...\) such that
- \(c_0\) is the initial configuration of \(M\) on \(x\)
- \(c_i \vdash c_{i + 1}\) for all \(i \geq 0\)
- The sequence has at most one halting configuration.
- If the sequence has a halting configuration \(c_n\), then \(c_n\) is the last configuration in the sequence.
We will use the symbol \(C(M,x)\) to refer to the computation of \(M\) on \(x\).
The length of a computation \(|C(M,x)|\) is the number of configurations in a sequence.
We want to identify the status of our computation
Let \(M\) be a Turing Machine \(M\) and \(x \in \Gamma^\ast\)
- \(M\) halts on \(x\) if the computation \(C(M,x)\) (sequence \(c_0, c_1, c_2, ...\)) contains a halting configuration
- \(M\) loops on \(x\) if the sequence \(c_0, c_1, c_2, ...\) does not contain a halting configuration
- \(M\) accepts \(x\) if the sequence \(c_0, c_1, c_2, ...\) contains an accepting configuration
- \(M\) rejects \(x\) if the sequence \(c_0, c_1, c_2, ...\) contains a rejecting configuration
\[ M(x) = \begin{cases} 1 & \text{if $M$ accepts $x$} \\ 0 & \text{if $M$ rejects $x$} \\ \infty & \text{otherwise} \end{cases} \]
Lastly, we want a way to classify TM’s based on whether they accept, reject or loop on all inputs.
Let \(M\) be a Turing Machine
- \(M\) is a decider if it accepts or rejects \(x\) for all \(x \in \Gamma^\ast\)
- \(M\) decides / solves a decision problem \(P: \Gamma^\ast \mapsto \{ 0, 1 \}\) if \(\forall x \in \Gamma^\ast, M(x) = P(x)\)
Let \(M\) be a Turing Machine
- If \(M\) is a decider, does it solve some decision problem?
- If \(M\) solves a decision problem, is it a decider?
We say that a Turing Machine \(M\) decides a language \(L \subseteq \Gamma^\ast\) if \[ M(x) = 1 \iff x \in L \]
Two Turing Machines \(M\) and \(N\) on the same input alphabet \(\Gamma\) are equivalent if they have the same input and output behaviour \[ \forall x \in \Gamma^\ast, M(x) = N(x)\]
How many equivalent Turing Machines are there?
Proof Sketch: If I take some Turing Machine \(M\), we can add a new state \(q\) and make the new machine \(M^\prime\). We can make \(q\) fall between any state \(q_i\) that has some transition to \(q_\text{accept}\).
- If \(q_i\) has a transition to \(q_\text{accept}\), make it instead transition to \(q\) (leaving the rest of the transition unmodified)
- \(q\) transitions to \(q_\text{accept}\) on every symbol, leaving the symbol unmodified.
\(M^\prime\) has not changed anything about any computation besides add a new configuration \(c_{n + 1}\) to the sequence. It has not changed the outcome, nor has it changed the tape in anyway. Thus \(M^\prime\) is equivalent to \(M\).
We can then create \(M^{\prime\prime}\) in the exact same fashion from \(M^\prime\). Repeat this ad-infinitum, and we have an infinite number of equivalent Turing Machines.
3.2.4 Computational Problems
A Turing Machine for computational problems is very similar to the standard definition. The only difference is that we have a single notable state \(q_\text{halt}\) instead of \(q_\text{accept}\) and \(q_\text{accept}\)
A Computational Turing Machine \(M\) is the tuple \((Q, \Gamma, \Delta, \delta, q_\text{init}, q_\text{halt})\) where
- \(Q\) is a finite non-empty set of states
- \(\Gamma\) is an input alphabet such that \(\sqcup \notin \Gamma\)
- \(\Delta\) is a tape alphabet, satisfying the following conditions:
- \(\Gamma \subseteq \Delta\) - Our input alphabet can be written on the tape
- \(\sqcup \in \Delta\) - Our tape has a blank symbol
- \(Q \cap \Gamma = \emptyset\) - The symbols and states are different
- \(q_\text{init}, q_\text{halt} \in Q\)
- \(\delta\) (called the transition function or program) is a finite set of instructions in the form \(q_i, s_n \rightarrow q_j, s_m, D\)
- \(q_i, q_j \in Q\)
- \(s_n, s_m \in \Gamma\)
- \(D \in \{ L, R \}\)
- and for all \(q \in Q \backslash \{q_\text{halt}\}\) and all \(s \in \Delta\) there exists only one instruction in \(\delta\) that has the form \(q, s \rightarrow q^\prime, s^\prime, D\)
The definitions of computation and status of a Computational TM remain the same as a Decision TM. However, we will use a slightly different configuration to define these objects:
- A configuration is halting if contains \(q_\text{halt}\)
Computational Turing Machines need to provide some form of output string that we can read off the tape. To be consistent, we say that our resultant string is the string from the tapes head until the next blank symbol that is composed of only input symbols \(\Gamma\).
Let \(M\) be a Computational Turing Machine.
We define the result of a computation of \(M\) on \(x\) as the string \(w\) satisfying the following properties
- \(w \in \Gamma^\ast\)
- \(w\) is written on the tap
- \(w\) is immediately followed by \(\sqcup\)
- \(M\) is scanning the first symbol of \(w\)
\[ M(x) = \begin{cases} w & \text{if } C(M,x) \text{ halts} \\ \infty & \text{otherwise} \\ \end{cases} \]
Computational TM’s can also solve some Computational Problem \(P: \Gamma^\ast \mapsto \Gamma^\ast\). However we use the term \(M\) computes \(P\) to specify it is a computational problem.
Let \(M\) be a Computational Turing Machine and \(P: \Gamma^\ast \mapsto \Gamma^\ast\) a computational problem
- \(M\) computes \(P\) if \[ \forall x \in \Gamma^\ast, M(x) = P(x)\]
and this gives us a fairly strong definition for when something is computable or not.
Let \(f:\Sigma^\ast \mapsto \Sigma^\ast\) be a function.
- \(f\) is computable if there exists some Turing Machine that computes \(f\)
3.3 Turing Machines as Algorithms
3.3.1 Analysis of Turing Machines
When we perform an analysis of an algorithm, we typically focus on two resources available to our algorithm:
- Time Complexity - How long will it take to run our algorithm
- Space Complexity - How much memory do we need for our algorithm
If we want to use Turing Machines as a mathematical model for our algorithms, we need to define these same tools.
3.3.1.1 Time
Our Turing Machine has only the single instruction. If we build this machine in reality, then it will take a roughly fixed amount of time to execute this instruction, no matter how many times we do it.
- If we built this device using mechanical components, this time is how long it takes to rotate the gears, manipulate the linkages and so on.
- If we built it use electronic signals, it would be the time to change the voltage of our circuit.
This means we can discard the actual unit and instead count the number of times an instruction is executed. We can recover the actual execution time by taking an average time to execute a single instruction and multiply it by the number of instructions.
A single instruction allows us to move between the different configurations of our machine, thus the number of instructions is the number of configurations in a computation.
Let \(M\) be a Turing Machine and \(x \in \Gamma^\ast\),
The computation time of \(M\) on \(x\) is the length of the computation.
\[ T(M,x) = |C(M,x)| \]
\(T(M,x)\) allows us to analyse the execution of a single input, but we want to understand how it runs on all inputs. However, if two strings are of equal length, then the machine will do the roughly the same amount of work, even if the result of that work is different. Instead of the individual strings we analyse the length of the strings.
Let \(M\) be a Turing Machine that will halt on every input,
The time function \(t: \mathbb{N} \mapsto \mathbb{N}\) of \(M\) is \[ t(n) = \max \{ T(M,x) = |C(M,x)| : x \in \Gamma^n \} \]
In english, our time function for a given input \(n\) is the maximum computation time across all strings of length \(n\).
We use the maximum time because some inputs of the same length may occasionally do less work.
Consider the following python script
x = input()
for i in range(0, len(x))
if x[i] == "0":
breakIf we input “111”, the for loop will be executed 3 times. If we instead input “101” it will only be executed twice, even though the strings are the same length.
When analysing this algorithm, we want to know the worst it will do at a given length, not the best. So we take \(t(3) = \max\{ 1, 2, 3, 3 \} = 3\) (Inputs: \(\{ 0xx, 10x, 110, 111 \}\) where \(x \in \Gamma\))
3.3.1.2 Space
In a digital computer, space is measured in the number of bits used. We can do the same here, but instead of bits we will use our tape instead of transistors.
Let \(M\) be a Turing Machine and \(x \in \Gamma^\ast\),
The computation space \(S(M,x)\) of \(M\) on \(x\) is the number of unique tape cells accessed by our machine.
This is more difficult to tie down an exact value, but we can define some bounds.
The machine can only move left or right on an instruction, it cannot stay in place. Even if we use the empty string, the machine scans a cell in the initial configuration, and scans the next cell after the instruction that may let it halt.
\[ q_\text{init}, \sqcup \rightarrow q_\text{halt / accept / reject}, s, D \]
For any other string, we can do the same, or just move back and forth between these two cells infinitely. So the smallest number of unique cells we can use is 2. And if we never re-use a cell i.e. every instruction accesses a new cell, then we can only access as many cells as there are instructions.
\[ 2 \leq S(M,x) \leq T(M,x) \]
For the same reasons as time, we want to study the lengths of a string and how the machine behaves, so we define our space function in a similar manner.
Let \(M\) be a Turing Machine that will halt on every input,
The space function \(s: \mathbb{N} \mapsto \mathbb{N}\) of \(M\) is \[ s(n) = \max \{ S(M,x) : x \in \Gamma^n \} \]
3.3.2 The Church-Turing Thesis
At this point, you should have an understanding that Turing Machines are a fairly strong model for algorithms. We can analyse them the same way as algorithms, we can program them using instructions, we can build them in the real world (this is a surprisingly important idea), and there is more yet to come.
However, we have not really discussed if a Turing Machine is an exact model of an algorithm. To jump the gun, yes. Turing Machines are an incredibly robust model, and every definition of computer that has been developed in the last 100 years has been shown to be equivalent to at least some form of Turing Machine. Even digital computers with Random-Access Memory are just slightly faster versions of a Turing Machine. In Turing Equivalency, we will study this idea more concretely.
To formulate this, we have developed a simple thesis.
Every computational problem solvable by an algorithm, can also be solved by a Turing Machine.
This is the Church-Turing Thesis. It allows us to assert that an algorithm is a Turing Machine. If we can’t write a Turing Machine, then we can’t write an algorithm and vice versa.
This is not a mathematical statement however. It converts an intuitive concept (an algorithm - sequence of steps) and converts it into a mathematical object (a TM - sequence of instructions). We cannot really prove this assertion, we only take it as true so long as it appears to be true.