University of Nevada Las Vegas
Howard R. Hughes College of Engineering
School of Computer Science
My Home Page

CSC 456/656 Binary Encoding of a Turing Machine

Revised October 30, 2005

In class, I frequently write <M> and say "the binary encoding of M." An encoding of TM (or, for that matter, an encoding of any object) Is a string that fully describes that TM, in the sense that a machine that emulates M need only read that string to know everything it needs to know about M.
There are many encoding schemes for Turing machines, but they all must have the following properties:
  1. Given a TM M, there is some encoding <M> of M using that scheme. (There may be more than one choice of <M> for the same M, though.)
  2. There is a machine that can emulate any Turing machine by reading its encoding. We call this machine the "emulator."
  3. If x and y are both encodings of Turing machines, then x cannot be a proper prefix of y. (This is to ensure that the machine that emulates TMs by reading their encodings will not become confused if the encoding is followed by another string.)
We will restrict our universe of Turing machines to those whose input alphabet Sigma is the binary alphabet {0,1}. We still allow the tape alphabet Gamma to include other symbols. A TM with any input alphabet is equivalent to one with the binary input alphabet, provided we provide conversion of the input string to binary.
A Humanly Readable Encoding Scheme
Our encoding alphabet will consist of commas, digits, letters, an end of line symbol, and an end of file symbol. Each line of the humanly readable encoding contains the encoding of one transition.
Each state has a unique name, which can be any string of digits and letters. The start state has the name "0". The accept state has the name "accept" and the reject state has the name "reject".
Each tape symbol has a unique name, which can be any string of digits and letters. The name of the blank is the empty string. The name of the binary symbol "0" is "0". The name of the binary symbol "1" is "1".
The name of the command to move right is "R". The name of the command to move right is "L". The name of the command to stay is "S".
The encoding of a transition is the name of the a state, followed by a comma, followed by the name of a tape symbol, followed by a comma, followed by the name of a tape symbol, followed by a comma, followed by the name of a move command, followed by a comma, followed by the name of a state, followed by the end of line symbol.
The humanly readable encoding of M consists of a number of lines, one for each transition of M, followed by the end of file symbol.
Here is a humanly readable encoding of the Turing machine shown in Figure 3.8, on page 144 of the second edition of Introduction to the Theory of Computation, by Michael Sipser. Although this encoding lists the transitions in lexical order, that is unnecessary. If I permute the rows, I simply get a different encoding for the same machine.
A Binary Encoding Scheme
Finally, <M> is the encoding of the humanly readable encoding into binary, using the Ascii code.
Note that <M> always ends with the Ascii encoding of the end of file symbol, and that a machine reading <M> will not entounter this symbol before eaching the end of <M>.
How the Emulator Works
It is simpler to describe the emulator E as a machine with one "regular" tape and two "special" tapes, the "code tape" and the "configuration tape."
The input string of the emulator is <M>w, where w is an arbitrary binary string. (Note that w could contain any number of instances of the Ascii code for end of file.)
U first loads <M> onto its code tape. It then loads #0#w onto its configuration tape.
E then emulates the steps of M, one by one. For each emulated step, the current state is encoded after the first octotroph on the configuration tape, while the current symbol is encoded after the second octotroph on the configuration tape. E searches for the first octotroph, then searches the code tape for a transition which begins with the name of the current state, then a comma, then the name of the current symbol. If it does not find a match, E hangs. If it does find a match, E uses the rest of the transition line to alter the configuration tape accordingly. If the new state is a halt state, E halts.
The Halting Problem and the Universal Turing Machine
Let Lhalt = {<M>w | w is accepted by M}. This is the same language as ATM as defined on page 174 of the second edition of Introduction to the Theory of Computation, by Michael Sipser, and is also known as the halting problem. By the Church Turing Thesis, we can replace the emulator E defined above by a Turing machine, U, which is then called a universal Turing machine.