Frequently Asked Questions

John Lawler
Linguistics Department
University of Michigan

Programming and Web assistance:
Kevin McGowan

What the hell is this, anyway?

About fifteen years ago a former student sent me a program called foggy that had been circulating underground at IBM which artfully made fun of pompous administrators and their jargon. It was in an internal IBM language, and I translated it into BASIC and Pascal. It was a cute program, but essentially very simple.

Somewhat later, I chanced on the Folklore Paper Construction Kit as quoted in Dwight Bolinger's prize-winning Language: The Loaded Weapon, and added its vocabulary of phrases as another option.

Finally, I met Anthony Aristar and he shared with me a similar program ( in Lisp) he'd written with phrases collected from the syntactic works of Noam Chomsky. That was the origin of The Chomskybot. It was easy to add these phrases, too.

Anthony says he was given these phrases in another program, and didn't know who actually collected them. However, since the Chomskybot has become semi-famous, we have been able to identify the original author as John F. Sowa, who admits putting the phrases together from several of Chomsky's books, including Syntactic Structures, Aspects, and Government and Binding at IBM in the 1980's. Since the real ingenuity in the Chomskybot is the way the phrases fit together, both Anthony and I are delighted to acknowledge Mr. Sowa as The Onlie Begetter of the Chomskybot. (Though we do notice that he hasn't owned up to it on his Web site yet :-)

Eventually I got around to making a Macintosh version using HyperCard, to supplement the DOS Version with source code in Pascal.

The script that runs the Chomskybot on the Web is written in Perl by Kevin McGowan, and it appears to have been reproduced (for satiric purposes, one hopes), in the form of some clones of our script, all of which still point at this file (or, more accurately, where this file used to be, so that you can't get here from there any more) to explain What It's All About, Anyway.

There used to be quite a few online, but the ones I know about have all evaporated over the years, leaving only the true, the blushful Hippocrene Chomskybot, none genuine without this signature. If you've clicked a link from one of these clones and are still wondering what it's all about, be aware that Kevin and I are only responsible for the original, and not for the clones. Though imitation is the sincerest form of flattery, and we're very flattered.

A couple of clones are still available:

That's all. The Chomskybot is a demonstration of a peculiarly primitive variety of computational linguistics. Once you've seen how it works -- if you care, and if you haven't recognized already how it's done -- you are unlikely to be interested in the details, I expect. They're boring. The operation, however, can be amusing.

What I find interesting about it is how it just hovers at the edge of understandability, a sort of semantic mumbling, a fog for the mind's eye. Like Eliza (a much cleverer program), Julia (Eliza's great-great-grandaughter), and the other chatterboxes you can explore on Simon Laven's AI-NLP page, or Peter Suber's Minds and Machines philosophy class home page, foggy's most interesting effects are in the mind of the beholder, especially since its output not infrequently induces a strong feeling of inferiority in the unsuspecting, a sense of "I just don't get it, so I must be dumber than I'd thought." This is the Turing Test in reverse, and humans should resist allowing themselves to fail.

If it amuses anybody -- and it's the only cheap thrill I have to offer for Web surfers -- I'll be pleased.
Though you might want to ask yourself why it's amusing.

How does it work?

By the "American Chinese Menu" principle, viz. One from Column A, One from Column B.
There are four sets of phrases:
Initiating Phrases     Subject Phrases     Verbal Phrases     Terminating Phrases

Foggy simply selects one of each, at (pseudo-)random, and then strings them together into a sentence. Five sentences make a paragraph. Foggy never even gets down to the word level; everything is phrases, and most of the phrases don't mean much. In this foggy resembles a large proportion of real language.

The Chomskybot is a Perl script, written by Kevin McGowan, with design and assurance testing by me. Kevin hasn't even looked at my previous code, I think -- the description of what it should do was easy enough for him to write it in Perl in one sitting.

Perl, incidentally, was developed by a linguist. Here are a few pointers to how it works:

How many Chomskybot paragraphs are there?

I'm glad you asked that question. The correct answer is that I don't know, exactly, because it's a very large number. But it's somewhere around what I would call 22 septillion in American English. Here's how I figure it.
  % wc chomsky.*
      35     179    1146 chomsky.1
      18     127     888 chomsky.2
      17      88     505 chomsky.3
      17     141     969 chomsky.4
      87     535    3508 total
These are the lines, words and bytes in the 87 phrases that make up the Chomskybot. There are 35 intro phrases, 18 subject phrases, 17 verb phrases, and 17 concluding phrases. Each sentence has one of each phrase.

Since they're independent and equally possible, subject to the whims of Perl's rand function, the product rule obtains, and that gives the probability for the first sentence in the paragraph (there are five in each) as 1 in
35 * 18 * 17 * 17 = 182,070 sentences. The second sentence has
34 * 17 * 16 * 16 = 147,968 variations, the third has
33 * 16 * 15 * 15 = 118,800, the fourth has
32 * 15 * 14 * 14 = 94,080, and the last has
31 * 14 * 13 * 13 = 73,346.

Once again, the product rule holds, since they're independently generated, except for removing the used phrases, which is what reduces the probability pool each time. The product of these 5 large numbers is a very large number:

(according to the Windows desk calculator.
For mere humans, that means something like:
"22,084,947,919,460,000,000,000,000, and then some."
I am informed by Martin Jansche that the exact value is precisely
which shows that the Windows desk calculator rounded up.
But then, what's a mere
3 trillion, 141 billion, 724 million, 160 thousand,
among friends?)

You're on your own about what to call that very large number. If you go with the American "million, billion, trillion" convention, this is 22 septillion. That's close enough for most people. If, on the other hand, you double up in the European English fashion: "million, milliard (or thousand million), billion, ..." then it'd be 22 quadrillion, assuming the lexicon of "..." is equivalent. If not, all bets are off.

By any measurement, though, that's a lot of wisdom. At present rates, the Chomskybot will continue to provide new wisdoms for about 4.416989583892 X 1022 years, which is about ten percent of the time left until the heat death of the universe.

More on English Grammar   More on Linguistics   More on Theory   Even More on Theory   Statistics
The Eclectic Company   Languages and Linguistics   Comments?   Last change 11/30/02   John Lawler