.
Source of the famous “Now you have two problems” quote

As I mentioned in my previous post, my Mastering Regular Expressions book was just reviewed on Slashdot. One thing that struck me in reading all the resulting comments was the (several different copies of an) apparently famous quote that goes something like:

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”   Now they have two problems.

It's apparently quite well known, so it floors me that this is the first I've seen it. Despite being a manifestation of the ignorance discussed in my previous post, I can certainly appreciate it for its wit.

This quote is generally attributed to Jamie Zawinski (an early Netscape engineer) from a post on the comp.lang.emacs Usenet newsgroup. Unfortunately, there's never been a comp.lang.emacs newsgroup, which renders the whole attribution suspect. Oddly curious about it, I did some digging...

It turns out that the source of the mistaken comp.lang.emacs reference was a May 1998 comp.lang.python post by Fredrik Lundh, who used it as a cute quote in his sig. He's a prolific writer so the sig appeard wide and far, and many people picked up on it and started using it themselves, and the fame of the quote (along with an incorrect attribution) spread.

But it was indeed Jamie Zawinski who first said it, in a Usenet post on August 12, 1997. Unfortunately, it seems that Google Groups, which holds a repository of Usenet postings going back a thousand years, does not have this particular post in its database. If it would, the post would be at the end of this broken link.

[UPDATE Feb 2013: it seems now that the link is no longer broken; Jamie's original post is there]

[UPDATE Jan 2007: Jamie Zawinski added a comment to this post]

Actually, it seems that none of Jamie Zawinski's posts in the threads that spawned from this initial post are in Google's database. This is quite odd. I have been able to find parts of his posts quoted in the replies of others. It was a heated thread, so there's plenty to go on.

There was a thread in comp.emacs.xemacs and alt.religion.emacs in which the idea of embedding Perl into Emacs was proposed. (My first-thoughts comment on this idea is that it's fairly silly, since Emacs already has a powerful lisp interpreter in it. Lisp is odd, though, in that it's a vastly more regular language than Perl, yet arguably less readable. But I digress...).

The main goal of the guy making the original suggestion was to get better regular-expression handling into Emacs. Perl treats regular expressions as first-class language features, making them a breeze to work with. They're not that hard to work with in Emacs Lisp, but in any case, Emacs's regular expressions are much less powerful and have a syntax that's even less readable than that of Perl's, if that's possible. (If you look up “toothpicks, scattered” in the index of my book, it brings you to a page about Emacs regular-expression syntax. :-))

Two things must be understood about Jamie Zawinski when evaluating his comments about this idea: he despised Perl, and he spent a nontrivial chunk of his life tending to Emacs and its lisp system, to the point that he considered it a religion. I can understand and appreciate having that kind of passion about something. In any case, talk about embedding Perl into Emacs would be heresy of the highest order.

It seems that during the course of this thread, Jamie referenced a three-month-old post by Kelly Murray in which Kelly sarcastically suggests something even more outlandishly silly (treating all data simply as a stream of bytes). Apparently, Jamie didn't realize that it was meant to be humorous or sarcastic. Combining this with the idea of embedding Perl into Emacs just for its regular-expression handling, and it was enough to put him over the edge, and Jamie lashed out:

Jamie Zawinski <jwz@netscape.com> wrote on Tue, 12 Aug 1997 13:16:22 -0700:
You are trying to shoehorn your existing preconceptions of how one
should program onto a vastly different (and older, and more internally
consistent) model.  I suggest your time would be better spent learning
and understanding that other model, and learn to use it properly, and
learn what it can and cannot do, rather than infecting it with this new
cancer out of ignorance.

The notion that everything is a stream of bytes is utterly braindead.
The notion that regexps are the solution to all problems is equally
braindead.

Just like Perl.

Some people, when confronted with a problem, think “I know,
I'll use regular expressions.”  Now they have two problems.

Jamie really disliked Perl, and in the ensuing discussion had a few other comments about it. In this snippet he responds to a “what's wrong with Perl?” question:

> What's wrong with perl?

It combines all the worst aspects of C and Lisp: a billion different
sublanguages in one monolithic executable.  It combines the power of
C with the readability of PostScript.

(I also appreciate that last sentence for its wit.)

A few days later it became clear that it's not only Perl itself, that he's upset with, but how he perceives it's often used:

Perl's nature encourages the use of regular expressions almost to the
exclusion of all other techniques; they are far and away the most
“obvious” (at least, to people who don't know any better) way to get
from point A to point B.

Mind you, he's keeping a fair mind about himself, allowing that Perl has some merit:

Perl is not *all* bad; just mostly

I find that the next statement is quite telling:

Maybe Java will save the day, once someone straps a Java front end onto
the gcc back end.

Later, he says:

The heavy use of regexps in Perl is due to them being far and
away the most obvious hammer in the box.

The heavy use of regexps in Emacs is due almost entirely to
performance issues: because of implementation details, Emacs
code that uses regexps will almost always run faster than
code that uses more traditional control structures.

Based solely on how lame the syntax is, and how generally
unmaintainable regexp-based code is, Perl would be very close
to the bottom of my list of choices for most tasks.

I'd agree with that first paragraph if the “most obvious hammer” phrase were changed to “most appropriate hammer”, because Perl is often used for advanced text processing, and that's exactly where regular expressions shine. That being said, I've written plenty of system tools in Perl that are mostly or completely devoid of regular expressions. I use them when they're the best tool, and don't when they're not.

Anyway, it was a colossal waste of time for me to track this all down (and for you to read this far :-)), but once I got on the trail it was hard to get off.

As cute as the “now you have two problems” quote is, it seems that Jamie wasn't the first to come up with the idea. The same quote (but with AWK rather than regular expressions as the punch line) shows up in the sig of John Myers post from 1988, where he credits a “D. Tilbrook” for it:

“Whenever faced with a problem, some people say `Lets use AWK.'
  Now, they have two problems.” -- D. Tilbrook

I've also seen the AWK quote credited to a “Zalman Stern” (in this 1993 post of quotations on alt.quotations). As Mark Bessey notes in a post on his blog, it's an all-purpose joke.

I can imagine that it was first used in by servicemen during WW2, along the lines of “Some people think `Let's ask the officers'....”.

UPDATES:

January 10th, 2007: this post made it high enough on reddit that it made my pageview jump by a factor of 10, and in doing so, brought in comments with more details on the history of the phrase than I had been able to unearth myself. Excellent! See the comment section for details.

January 15th, 2007: Jamie Zawinski himself commented on this post.


Comments so far....

I think all the fuss, misunderstanding and/or over-expectation around regular expression comes from the absence of an important chapter or appendix on MRE. The title of the chapter would be Things Regular Expression is not Good at, or, limitation of the regexp. For example, it is not good at handling negation-match, conditionals, nesting, etc. In part, the absence gives much ill-effect on those who do not have authentic formal language theory background(?), if I could call it. These days, 99% of the regexp users belong to the group, yes, I am one of them. JF should give helping hands for them in the fourth edition.

— comment by Hiroshi Iwatani on September 15th, 2006 at 3:17pm JST (8 years ago) comment permalink

I resent the comment about Perl and PostScript, PS is actually a very nice language – I once wrote a program combining both. Shortly afterwards, I ditched Perl for Python. That was in 1994, haven’t looked back since.

JWZ is a high priest of the church of XEmacs, by the way. The X is very important.

— comment by Fazal Majid on September 21st, 2006 at 5:24pm JST (8 years ago) comment permalink

JWZ used the expression earlier than 1997. He used it in 1992, on the Unix-Hater’s mailing list. It was published in the Unix Hater’s Handbook, with “sed” as the punchline.

http://research.microsoft.com/~daniel/unix-haters.html

The email makes it clear that JWZ didn’t create the quote (though that’s obvious, now). He placed it in quotes and made reference to it as a remembered truism in the 1992 email.

— comment by Derek on January 10th, 2007 at 1:10am JST (7 years, 8 months ago) comment permalink

“D. Tilbrook” would undoubtedly be the David Tilbrook described at http://tlug.ss.org/wiki/David_Tilbrook, an early UNIX user and qed developer.

— comment by Kelvin on January 10th, 2007 at 1:39am JST (7 years, 8 months ago) comment permalink

In the Unix Haters Handbook (http://research.microsoft.com/~daniel/uhh-download.html) page 206 (as per the pdf) has a quote from jwz dated from 12 Dec 1992.

Now at this point I should have remembered that profound truism:
“Some people, when confronted with a Unix problem, think ‘I know,
I’ll use sed.’ Now they have two problems.”

It was from the Unix Haters Handbook that I recognised that quote.

— comment by Ian J Cottee on January 10th, 2007 at 1:43am JST (7 years, 8 months ago) comment permalink

Google groups has had that post available in the past. I have an overhead slide of it that I use for lectures in my Thy of Comp class, on the last day of regexes. I got it from them.

— comment by Jim Hefferon on January 10th, 2007 at 3:03am JST (7 years, 8 months ago) comment permalink

The actual quote from David Tilbrook is

“If you have a problem and you think awk(1) is the solution,
then you have two problems.”

The earliest citation I’ve found is from a paper he wrote for the 1989 Usenix Software Management Workshop called Under 10 Flags (not always smooth sailing):
(See footnote 12.)

larry

— comment by Larry Hastings on January 10th, 2007 at 4:39am JST (7 years, 8 months ago) comment permalink

Pages 168-171 of “The UNIX Haters Handbook” have a post from jwz@lucid.com, dated 12 Dec 1992, in which Jamie says:

“Some people, when confronted with a Unix problem, think ‘I know, I’ll use sed.’ Now they have two problems.”

— comment by jck on January 10th, 2007 at 5:38am JST (7 years, 8 months ago) comment permalink

jwz probably used the X-No-Archive header.

— comment by ... on January 10th, 2007 at 5:55pm JST (7 years, 8 months ago) comment permalink

“Give a man a regular expression and he’ll match a string…
teach him to make his own regular expressions and you’ve got a man with problems.”
–me_da_clever_one

— comment by yakugo on January 11th, 2007 at 12:30am JST (7 years, 8 months ago) comment permalink

Wow, I can’t believe you went to all this trouble over a .sig file! Anyway, a few comments:

First, I don’t know why none of that stuff is archived, I’ve never used X-No-Archive or anthing like it.

Second: yeah, I was repurposing the older “sed” quote, which I didn’t come up with myself, but that seemed appropriate as “sed” is where I learned regexps in the first place. (And I will pay Perl another backhanded complement: “it’s not as bad as sed.”)

Third: obviously I got Kelly’s joke about “streams of bytes”, uh, that’s why I quoted it. It’s funny, and it makes the point (which I fully agree with) that the decades-old Unix “pipe” model is just plain dumb, because it forces you to think of everything as serializable text, even things that are fundamentally not text, or that are not sensibly serializable.

Fourth: I like PostScript, and it’s a safe bet that I’ve written more and hairier PostScript by hand than anyone reading this… but the syntax is as close to “write-only” as in any language I’ve ever used. Anyone who defends PostScript as being “readable” is a monster raving loony.

Fifth: these days I use Perl a lot, but I still don’t like it. So there.

— comment by Jamie Zawinski on January 15th, 2007 at 5:58pm JST (7 years, 8 months ago) comment permalink

Soooo, how about writing a PostScript pretty printer … in Perl? :-)))) Or even better a more general un-obfuscator with modules for languages – starting with JavaScript :-) First test — un-obfuscate Google maps “api” :-)))

The point being that one can write trash code in any language, so the more rational approach would be to go writing un-obfuscators. And then a “condenser” too – to remove the verbosity blown to infinity – like 10 “namespaces” and 5 classes chained just to call a single function or variable names looking like sentences and variable and function names that spell out the whole sentence.

So compared to something like …

System.Xml.Serialization.SoapIncludeAttribute.GetCustomAttribute(System.Type.GetType("System.Text.RegularExpressions")).ToString();

… I’ll take @” \((?>[^()]+|\( (?)|\)(?))*(?(DEPTH)(?!))\)” any time of the day, even if not spelleed out with spaces and newlines – as long as there’s no VB in sight :-))) or I have a Reflector to autoconvert it to C#.

And that regexp is not exactly quoted by chance either – it could potentially untangle the mess that the first quoted code creates :-))). If I’d have to define the regexp in one sentece it would be – the one that untangles the mess :-), while runnig fast :-))

People can go bitching about the “stream of bytes” till the hell freezes over but as long as different architectures and software need to talk to each other and we don’t fancy becoming slaves of one HUGE company and it’s proprietary formats – we’ll be parsing, and parsing, and parsing again. And not in Lisp since then we’d have 3 more problems :-). C’est la vie – pardon my French. So, I’d love to have universally available lex-and-yacc-combined parsers, in a form writable in a few lines and without callbacks, but till such times, a lil regexp will do … :-)))

— comment by Zarko on January 24th, 2007 at 8:13pm JST (7 years, 8 months ago) comment permalink

“It combines the power of C with the readability of PostScript.”

As a PostScript programmer, I resent that. PostScript is not inherently unreadable, but that some PostScript is unreadable where a choice has been made for it to be that way, otherwise it is highly readable.

— comment by PostScript_programmer on October 2nd, 2007 at 3:07pm JST (6 years, 11 months ago) comment permalink

Erik Naggum’s infamous “Perl treatise” starts off in a similar manner:

the unemployed programmer had a problem. “I know”, said the programmer, “I’ll just learn perl.” the unemployed programmer now had two problems.

It’s from 2000.

— comment by Marius Andersen on November 4th, 2007 at 7:08am JST (6 years, 10 months ago) comment permalink

BTW, a preview button would have been nice.

— comment by Marius Andersen on November 4th, 2007 at 7:08am JST (6 years, 10 months ago) comment permalink

Following Yakugo’s lead…

“Give a man a regular expression and he’ll match a string…
teach him to make his own regular expressions and you’ve got a man with problems.”
–me_da_clever_one

“Give a man a regular expression and he’ll match a string… but by teaching him how to create them, you’ve given him enough rope to hang himself” – Andy Hood

— comment by Andy on January 23rd, 2008 at 5:36pm JST (6 years, 8 months ago) comment permalink

Maybe the people who are claiming PostScript can be readable can post links to their readable PostScript? I agree that most PostScript you run across is less readable than it could be, and I admit I’ve probably written less than a thousand lines of PostScript in my life, but even the stuff in the Blue Book doesn’t seem all that readable to me, compared to C or Python or elisp or, yes, even Perl. The postfix syntax (what is this block for again? is this a loop or a conditional or what? let me scan down to the end of the block — oh, there’s another block there, what’s after that? Oh, ifelse, okay, where was I again?) and the point-free style (how many things does this function foo want on the stack? Well, apparently one less than this bar that it calls; where is bar again?), the pervasive use of higher-order programming both because it’s easy and because the built-in control features are a little lame (where was that function defined again? grep can’t find it…), and certain “noisy” idioms like explicit manipulation of symbol tables in order to get named local variables and fixed-size aggregate data structures… it’s all kind of a mess.

However, it does have some big readability advantages over, say, C, at least sometimes. The graphics drawing API is to die for; as long as you don’t care about the error handling, it’s ideal for defining embedded DSLs, precisely because of its point-free higher-order nature; it has built-in arrays and dicts (even if they are fixed-size); and so on. But these are mostly a help to readability in the large, not in the small.

Anyway, this is already too long for an off-topic comment, so I’ll stop.

I’ve written full-fledged applications in PostScript – it can be done – but it’s important to remember that PostScript has been designed for machine-generated scripts. A human does not normally code in PostScript directly, but rather, they write a program in another language that produces PostScript to do what they want. (I realized this after having written said applications :-)) —Jeffrey

— comment by Kragen Javier Sitaker on May 14th, 2008 at 5:37am JST (6 years, 4 months ago) comment permalink

OK, I think I’ve written more PostScript by hand than Jamie, so I assume he thinks I’m not reading this. Back in the old days, I designed a system that used incredible amounts of PostScript. One thing that made it easier for us was a C-like syntax to PS compiler, done by a fellow at the Turning Institute. We licensed it and used it heavily, and I extended it a bit to be able to handle uneven stack-armed IF, and added varieties of inheritance. The project was called
PdB and eventually it folded, and the author left and went to First Person Software, where he wrote a very similar language syntax for something called Oak, and it compiled to bytecodes instead of PostScript. Oak got renamed Java.

So there.

And yes, we did have two problems…

— comment by Leigh L. Klotz, Jr. on June 7th, 2008 at 3:22am JST (6 years, 3 months ago) comment permalink

And for the record, I can confirm that I got the quotation from that thread. As we all know, “All good quotes come from jwz, or are slightly paraphrased versions of something he’s said.” (use google if you want the source of *that* quotation ;-)

As for the mistaken attribution, it’s just a silly mistake. But it has been fun to see how it’s spread over the net over the years.

— comment by Fredrik on July 26th, 2008 at 9:39pm JST (6 years, 2 months ago) comment permalink

There’s a nicer alternative to the ugly model “use regular expressions to match text patterns and print out replacements as Perl actions”: Xerox has defined a language that allows name abstraction, i.e. sub-expressions can be named for comprehension, modularity and re-use.
Furthermore, their language is symmetric between input and output, i.e. extended regular expressions can be run forwards and backwards (which means, if you specify a converter, you get the back converter for free).

http://www.cis.upenn.edu/~cis639/docs/xfst.html
http://www.stanford.edu/~laurik/fsmbook/home.html
(e.g. http://www.stanford.edu/~laurik/fsmbook/examples/NumbersToNumerals.html shows a converter between English numberals: 15 -> “fifteen” or back “two hundred” -> 200)

Perl regex R.I.P.

There are lots of languages and systems that all sort of do the same thing, and Perl was certainly not the first nor the best, but whatever gave it its staying power for the last 20 years will not vanish overnight. Given that the language you cite has been around for seven years and had no apparent impact, your “R.I.P.” comment seems a touch comical. Still, I’m all for anything useful, so if it is, let’s hope it gets some traction. —Jeffrey

— comment by Dr. Jochen L. Leidner on March 17th, 2010 at 12:23am JST (4 years, 6 months ago) comment permalink

Perl regex R.I.P.

“My name is Ozymandias, King of Kings \ Look on my works, ye Mighty, and despair!”

Still, it would be nice to have a friendlier way to do every single thing that Perl regexps do…

— comment by Eric TF Bat on March 17th, 2010 at 7:49am JST (4 years, 6 months ago) comment permalink
— comment by Peter J. Hart on March 17th, 2010 at 9:04am JST (4 years, 6 months ago) comment permalink

I know why regular expressions are a problem.

It’s quite simple: It’s hard to distinguish between the data and operators.

It’s even more difficult when you add in another layer of escaping, such as emacs or perl.

For example, I’m continually wondering if a parenthesis will match literally or if it will be interpreted as a grouping operator. How many backslashes will I need to get it right? In xemacs, it’s always a little confusing because sometimes you’re prompted for a regular expression, and sometimes you code it in lisp.

I vaguely remember using a language many years back called REXX on IBM mainframes that had an interesting instruction called ‘parse’ that did only a fraction of what regular expressions would do, but there seemed to be a clear idea what was a pattern you were matching.

— comment by MikeP on March 17th, 2010 at 12:21pm JST (4 years, 6 months ago) comment permalink

I know this post is now very old, but since it pertains to the origins of a quote equating a solution to a problem, I thought you might be interested in this quote, which was quothed sometime before 1832:
“The solution of every problem is another problem.” -Johann Wolfgang von Goethe

— comment by DavidY on November 26th, 2011 at 6:49am JST (2 years, 10 months ago) comment permalink

I’ve added this phrase to the queue at snowclones.org under the name “Zawinski’s Snowclone”. (It may not show up there for a while due to comment moderation.) A snowclone is a well-known phrase with one or more gaps in it that you can fill in according to circumstances, like “X is the new Y”, which can be instantiated by anything from “Pink is the new black” to “Thursday is the new Friday.”

— comment by John Cowan on April 26th, 2012 at 12:07am JST (2 years, 5 months ago) comment permalink

Interesting to see the emotions this article stirred up. Personally, I believe it is quiet a clever comment and funny if nothing else. Whether it holds any ground or not is up for debate but I would like to point out that this is the opinion of someone, not the truth. Everyone has issues with one form of technology or another and no matter how much you argue it, you will never change their minds. Use the right tool for the job is my stand on all this. Also, lets learn to take ourselves a little less seriously :) Here in South Africa, we do whatever it takes to get the job done, even if it is a combination of platforms. As long as sound principles are used, there is nothing wrong with mixing and matching.

— comment by Kenneth Clark on October 16th, 2012 at 9:00pm JST (1 year, 11 months ago) comment permalink

The working Google Groups link to Jamie’s post:

https://groups.google.com/d/msg/alt.religion.emacs/DR057Srw5-c/Co-2L2BKn7UJ

Cool, thanks! It seems that the “broken link” I included in the original post actually does work now (redirecting to the URL you included). Maybe it was just offline when I happened to check all those years ago? —Jeffrey

— comment by Andrew G Shebanow on February 9th, 2013 at 3:56am JST (1 year, 7 months ago) comment permalink

I can confirm that David Tilbrook made the equivalent comment about Awk; I once observed on the “now you have two problems” comment about regular expressions, and he got confused in pretty much exactly the fashion that would be expected as he thought I was modifying his quote.

Tilbrook is notable for taking the QED text editor pretty much to its limit, and is still using QED, 40 years later. Also knows a considerable amount of Unix trivia

— comment by Christopher Browne on February 9th, 2013 at 2:28pm JST (1 year, 7 months ago) comment permalink

I almost felt obliged to jump in and defend Postscript towards the top of the page, but further down I see it has already been vindicated. I’m a … a little obsessed with Postscript to say the least. There are of course readability issues that arise in any sort of encoding. see the Hieroglyphics of Horappolon f’rinstance. But reverse-polish is really not that bad. It resolves the ambiguity of precedence without the need for parentheses. It is, I daresay, as powerful as Lisp, without the dreaded parentheses. Some come, my lovelies, come to the dark side. :)

(Incidentally, I’m writing an open-source postscript interpreter. Jeffrey, I’d love to hear whether it can handle your postscript application or what trouble it runs into if not. )
a.k.a. luser droog

— comment by M. Joshua Ryan on January 11th, 2014 at 6:34pm JST (8 months, 5 days ago) comment permalink

I wonder why despite the popularity of recent sugarcoat languages like CoffeeScript, no alternative regex syntax has taken off. Maybe because copy-and-paste just works for many instances or maybe this quote only represents a loud minority that includes MikeP and me.

I wouldn’t say the minority is loud. It’s a witty quote that can be used in many situations, so deserves its popularity. If it weren’t for the wit, no one would pay attention to its use in this context. —Jeffrey

— comment by Tom on March 16th, 2014 at 9:35pm JST (6 months ago) comment permalink
Leave a comment...


All comments are invisible to others until Jeffrey approves them.

Please mention what part of the world you're writing from, if you don't mind. It's always interesting to see where people are visiting from.

More or less plain text — see below for allowed markup

You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting