While at a Christmas party, I met a Computer Science undergrad currently earning his degree at UNM. He was having trouble grappling Quine programs. A Quine program is a computer program that takes in no input, and produces a copy of its own source code as the only output. That is — self-replicating code. It is not as easy to explain as one might think, and my poor crude attempts at doing so only confused this poor CS student further. To my defense, I used examples you can find online, but many of these examples are not very intuitive to understand. Take for example this Quine written in Python:
s='s=%r;print s%%s,';print s%s,
Fire up a Python interpreter and try it out. It simply prints that exact line you typed in. Unless you are very familiar with Python strings and the string formatting codes, it is hard to see how this program works. But it is essentially, defining a string s, then using that string itself to replicate. Here is a Quine written in the old Kernigan and Ritchie C style:
main(){char*s="main(){char*s=%c%s%c;printf(s,34,s,34);}";printf(s,34,s,34);}
I don’t believe it will compile with today’s C99 or better compilers, which don’t allow programmers to implicitly call the printf function with out using:
#include <stdio.h>
Again, the C program is not any more intuitive to understand than the Python example. So today, my goal is to write a Quine program in C that is easier for me to explain by example. Instead of using printf format strings, I am going to leverage the power of a computer’s ability to represent C source code as data. The goal is to take the data representation of the source code and print it, then translate the data into the ASCII high-level source code and print that. After some trial and error, here is what I came up with:
#include <stdio.h>
int
main (void)
{
unsigned int i;
printf("const unsigned char data[] = {");
for (i = 0; i<sizeof(data); i++)
{
if (i%8 == 0)
printf("\n");
printf("%0#4x,", data[i]);
}
printf("\n};\n\n");
for (i = 0; i<sizeof(data); i++)
putchar(data[i]);
return 0;
}
The above is my partial program so far. There are two for-loops. The first loop will iterate through a byte array called data, which I have not defined yet because this data array will contain the hexadecimal representation of this ASCII code (i.e. everything from the #include to the last right curly-brace at the bottom).
The second for-loop takes that same data array and uses the C library putchar that will translate the hexadecimal representation into an ASCII character, which will give us all that C code from the #include to the last right curly-brace at the bottom. In other words: this source code will print an exact copy of the data array and our source code when executed.
Next, we need to translate our C code into the byte array data and place it at the top of our program (before the #include statement in the source above). It should look like this:
const unsigned char data[] = {
0x23,0x69,0x6e,0x63,0x6c,0x75,0x64,0x65,
0x20,0x3c,0x73,0x74,0x64,0x69,0x6f,0x2e,
0x68,0x3e,0x0a,0x0a,0x69,0x6e,0x74,0x0a,
0x6d,0x61,0x69,0x6e,0x20,0x28,0x76,0x6f,
0x69,0x64,0x29,0x0a,0x7b,0x0a,0x20,0x20,
0x20,0x20,0x75,0x6e,0x73,0x69,0x67,0x6e,
0x65,0x64,0x20,0x69,0x6e,0x74,0x20,0x69,
0x3b,0x0a,0x20,0x20,0x20,0x20,0x0a,0x20,
0x20,0x20,0x20,0x70,0x72,0x69,0x6e,0x74,
0x66,0x28,0x22,0x63,0x6f,0x6e,0x73,0x74,
0x20,0x75,0x6e,0x73,0x69,0x67,0x6e,0x65,
0x64,0x20,0x63,0x68,0x61,0x72,0x20,0x64,
0x61,0x74,0x61,0x5b,0x5d,0x20,0x3d,0x20,
0x7b,0x22,0x29,0x3b,0x0a,0x20,0x20,0x20,
0x20,0x66,0x6f,0x72,0x20,0x28,0x69,0x20,
0x3d,0x20,0x30,0x3b,0x20,0x69,0x3c,0x73,
0x69,0x7a,0x65,0x6f,0x66,0x28,0x64,0x61,
0x74,0x61,0x29,0x3b,0x20,0x69,0x2b,0x2b,
0x29,0x0a,0x20,0x20,0x20,0x20,0x7b,0x0a,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x69,0x66,0x20,0x28,0x69,0x25,0x38,0x20,
0x3d,0x3d,0x20,0x30,0x29,0x0a,0x20,0x20,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,0x20,0x70,0x72,0x69,0x6e,0x74,0x66,
0x28,0x22,0x5c,0x6e,0x22,0x29,0x3b,0x0a,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x70,0x72,0x69,0x6e,0x74,0x66,0x28,0x22,
0x25,0x30,0x23,0x34,0x78,0x2c,0x22,0x20,
0x64,0x61,0x74,0x61,0x5b,0x69,0x5d,0x29,
0x3b,0x20,0x20,0x20,0x20,0x0a,0x20,0x20,
0x20,0x20,0x7d,0x0a,0x20,0x20,0x20,0x20,
0x70,0x72,0x69,0x6e,0x74,0x66,0x28,0x22,
0x5c,0x6e,0x22,0x7d,0x3b,0x5c,0x6e,0x5c,
0x6e,0x22,0x29,0x3b,0x0a,0x20,0x20,0x20,
0x20,0x66,0x6f,0x72,0x20,0x28,0x69,0x20,
0x3d,0x20,0x30,0x3b,0x20,0x69,0x3c,0x73,
0x69,0x7a,0x65,0x6f,0x66,0x28,0x64,0x61,
0x74,0x61,0x29,0x3b,0x20,0x69,0x2b,0x2b,
0x29,0x0a,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,0x20,0x70,0x75,0x74,0x63,0x68,0x61,
0x72,0x28,0x64,0x61,0x74,0x61,0x5b,0x69,
0x5d,0x29,0x3b,0x0a,0x20,0x20,0x20,0x20,
0x72,0x65,0x74,0x75,0x72,0x6e,0x20,0x30,
0x3b,0x0a,0x7d,0x0a,
};
Now, how did I get the above? Well, we can use an ASCII table and transcribe our C source above by hand… # = 0x23, i = 0x69, n = 0x6E, c = 0x63, l = 0x6C, u = 0x75, d = 0x64, e = 0x65, spaces = 0x20, newlines = 0x0A, etc. That is a lot of work. Being a lazy computer scientist, I wrote the following python script to do it for me:
import sys
f = open(sys.argv[1], 'r')
s = ''
for line in f:
for l in line:
s += '0x%02x' % ord(l)
s += ','
s += '0x0a,'
step = 40
for i in range(0, len(s), step):
line = s[i:i+step]
print line
In the above, you pass in the C file of our first code listing above, and it prints to the screen:
0x23,0x69,0x6e,0x63,0x6c,0x75,0x64,0x65,
0x20,0x3c,0x73,0x74,0x64,0x69,0x6f,0x2e,
0x68,0x3e,0x0a,0x0a,0x69,0x6e,0x74,0x0a,
0x6d,0x61,0x69,0x6e,0x20,0x28,0x76,0x6f,
0x69,0x64,0x29,0x0a,0x7b,0x0a,0x20,0x20,
0x20,0x20,0x75,0x6e,0x73,0x69,0x67,0x6e,
0x65,0x64,0x20,0x69,0x6e,0x74,0x20,0x69,
0x3b,0x0a,0x20,0x20,0x20,0x20,0x0a,0x20,
<-------snipped----------------------->
0x28,0x64,0x61,0x74,0x61,0x29,0x3b,0x20,
0x69,0x2b,0x2b,0x29,0x0a,0x20,0x20,0x20,
0x20,0x20,0x20,0x20,0x20,0x70,0x75,0x74,
0x63,0x68,0x61,0x72,0x28,0x64,0x61,0x74,
0x61,0x5b,0x69,0x5d,0x29,0x3b,0x0a,0x20,
0x20,0x20,0x20,0x72,0x65,0x74,0x75,0x72,
0x6e,0x20,0x30,0x3b,0x0a,0x7d,0x0a,
You can copy and paste that into your
const unsigned char data[] = {
},
block and put that above the #include of the first code listing above. I used Eclipse-C++ to code and run my program. Upon execution the program should print to the console:
const unsigned char data[] = {
0x23,0x69,0x6e,0x63,0x6c,0x75,0x64,0x65,
0x20,0x3c,0x73,0x74,0x64,0x69,0x6f,0x2e,
0x68,0x3e,0x0a,0x0a,0x69,0x6e,0x74,0x0a,
0x6d,0x61,0x69,0x6e,0x20,0x28,0x76,0x6f,
0x69,0x64,0x29,0x0a,0x7b,0x0a,0x20,0x20,
0x20,0x20,0x75,0x6e,0x73,0x69,0x67,0x6e,
0x65,0x64,0x20,0x69,0x6e,0x74,0x20,0x69,
0x3b,0x0a,0x20,0x20,0x20,0x20,0x0a,0x20,
0x20,0x20,0x20,0x70,0x72,0x69,0x6e,0x74,
0x66,0x28,0x22,0x63,0x6f,0x6e,0x73,0x74,
0x20,0x75,0x6e,0x73,0x69,0x67,0x6e,0x65,
0x64,0x20,0x63,0x68,0x61,0x72,0x20,0x64,
0x61,0x74,0x61,0x5b,0x5d,0x20,0x3d,0x20,
0x7b,0x22,0x29,0x3b,0x0a,0x20,0x20,0x20,
0x20,0x66,0x6f,0x72,0x20,0x28,0x69,0x20,
0x3d,0x20,0x30,0x3b,0x20,0x69,0x3c,0x73,
0x69,0x7a,0x65,0x6f,0x66,0x28,0x64,0x61,
0x74,0x61,0x29,0x3b,0x20,0x69,0x2b,0x2b,
0x29,0x0a,0x20,0x20,0x20,0x20,0x7b,0x0a,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x69,0x66,0x20,0x28,0x69,0x25,0x38,0x20,
0x3d,0x3d,0x20,0x30,0x29,0x0a,0x20,0x20,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,0x20,0x70,0x72,0x69,0x6e,0x74,0x66,
0x28,0x22,0x5c,0x6e,0x22,0x29,0x3b,0x0a,
0x20,0x20,0x20,0x20,0x20,0x20,0x20,0x20,
0x70,0x72,0x69,0x6e,0x74,0x66,0x28,0x22,
0x25,0x30,0x23,0x34,0x78,0x2c,0x22,0x20,
0x64,0x61,0x74,0x61,0x5b,0x69,0x5d,0x29,
0x3b,0x20,0x20,0x20,0x20,0x0a,0x20,0x20,
0x20,0x20,0x7d,0x0a,0x20,0x20,0x20,0x20,
0x70,0x72,0x69,0x6e,0x74,0x66,0x28,0x22,
0x5c,0x6e,0x22,0x7d,0x3b,0x5c,0x6e,0x5c,
0x6e,0x22,0x29,0x3b,0x0a,0x20,0x20,0x20,
0x20,0x66,0x6f,0x72,0x20,0x28,0x69,0x20,
0x3d,0x20,0x30,0x3b,0x20,0x69,0x3c,0x73,
0x69,0x7a,0x65,0x6f,0x66,0x28,0x64,0x61,
0x74,0x61,0x29,0x3b,0x20,0x69,0x2b,0x2b,
0x29,0x0a,0x20,0x20,0x20,0x20,0x20,0x20,
0x20,0x20,0x70,0x75,0x74,0x63,0x68,0x61,
0x72,0x28,0x64,0x61,0x74,0x61,0x5b,0x69,
0x5d,0x29,0x3b,0x0a,0x20,0x20,0x20,0x20,
0x72,0x65,0x74,0x75,0x72,0x6e,0x20,0x30,
0x3b,0x0a,0x7d,0x0a,
};
#include
int
main (void)
{
unsigned int i;printf("const unsigned char data[] = {");
for (i = 0; i<sizeof(data); i++)
{
if (i%8 == 0)
printf("\n");
printf("%0#4x," data[i]);
}
printf("\n"};\n\n");
for (i = 0; i<sizeof(data); i++)
putchar(data[i]);
return 0;
}
You should be able to copy and paste the print out into an IDE, compile and run it. Quines are pretty neat. I enjoyed this little exercise. It made me think about levels of meaning; about values and their representations. Computers and programs can’t differentiate between data and code. It is the context that determines when data is meant to be interpreted as code instead of data, and when code is meant to be data (e.g. when downloading a binary file from the Internet). This is an important concept in computer security. It is the fact that we can use data as code that makes shellcode exploits possible.
“Yields falsehood when preceded by its quotation” yields falsehood when preceded by its quotation.
–Quine’s paradox