.
Back to the Future: On the Terms “NFA,” “DFA,” and “Regular Expression”

In my previous post I dug up messages someone wrote almost 10 years ago, and in some places challenged what they wrote. In writing the post, I mostly wanted to document the thread that created the somewhat-famous “now you have two problems” quote, and the challenges and rebuttals I included are a bit unfair of me (being 10 years later, with little chance that the author would care to engage now in a debate about writings and technology 10 years old).

So, with that in mind I'd like to refute a comment made yesterday on the Slashdot review of my book:

On Sep 13, 2006, someone commented on Slashdot
Don't believe the “computer science” here.

“Nondeterministic finite automata” is well defined in comp-sci and Friedl has it wrong. The set of languages accepted by NFA's is exactly the same as the set accepted by DFA's.

Perl's engine and its brethren use search-and-backtrack. They accept a lot things that are not regular expressions. Such engines don't have much theory behind them, and it's hard to reason generally about what they do and how fast they do it.

In the context of what my book covers, this is unreasonable on many levels, but I won't respond to it now, I'll respond to it 10 years ago. I wrote the following in 1997, in a post on comp.lang.perl.misc:

On March 20, 1997, Jeffrey Friedl wrote:
Re: term 'regular expressions' considered undesirable
|> Perhaps it's now too late, and we need to find a new
|> term for what we used to call regular expressions.

Perhaps it's best to differentiate between the calculus phrase “regular expression”, coined by Dr. Stephen Kleene in the 40s, and the use of the same term in modern computer science. There are many situations in language where the meanings of words have changed. Some examples:

  • When I was growing up, “How are you doing?” was not a statement.
  • A Japanese futon has no relationship to what is called a “futon” in American English.
  • Tennis shoes are rarely used for tennis.
  • The little eraser-head pointing devices on laptops are called mice.

Ah, well, I could go on forever. But basically, the term “regular expression” has no implied meaning to most people that first come across it in Perl or whatnot, so little confusion arises. It would have been nice had some other word been used from the start (perhaps one giving more of a hint as to what the thing is all about), but for the most part we're stuck with history. I suppose you could start a crusade to use a different phrase. How about 'whachamacallit'? :-)

Jeffrey

Whoever wrote the comment also wrote a similar comment in March, and believes that the reality of how things work is somehow dictated by the name. I tend to believe that the reality of how things work is dictated by, well, reality, often despite the name. (Names/functions of government offices, like “Internal Revenue Service”, are an excellent example of this :-)

My book is concerned with the reality of how things work, and I make it clear in the book that the reality is different than what the linguistic historical-baggage might imply to some small subset of people.

To continue the special theme of this post of bringing up 10-year-old writings to rebut current ignorance, here's the sidebar on page 104 of the first edition of my book (the poster having said that he has the first edition), the master source having been last updated on November 4, 1996:

From Mastering Regular Expressions, First Edition, page 104
(A similar sidebar appears on page 180 of later editions)
NFA: Theory vs. Reality

The true mathematical and computational meaning of “NFA” is different from what is commonly called an “NFA regex engine.” In theory, NFA and DFA engines should match exactly the same text and have exactly the same features. In practice, the desire for richer, more expressive regular expressions has caused their semantics to diverge. We'll see several examples later in this chapter, but one right off the top is support for backreferences.

As a programmer, if you have a true (mathematically speaking) NFA regex engine, it is a relatively small task to add support for backreferences. A DFA's engine's design precludes the adding of this support, but an NFA's common implementation makes it trivial. In doing so, you create a more powerful tool, but you also make it decidedly nonregular (mathematically speaking). What does this mean? At most, that you should probably stop calling it an NFA, and start using the phrase “nonregular expressions,” since that describes (mathematically speaking) the new situation. No one has actually done this, so the nameNFA” has lingered, even though the implementation is no longer (mathematically speaking) an NFA.

What does all this mean to you, as a user? Absolutely nothing. As a user, you don't care if it's regular, nonregular, unregular, irregular, or incontinent. So long as you know what you can expect from it (something this chapter will show you), you know all you need to care about.

For those wishing to learn more about the theory of regular expressions, the classic computer-science text is chapter 3 of Aho, Sethi, and Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley, 1986), commonly called “The Dragon Book” due to the cover design. More specifically, this is the “red dragon”. The “green dragon” is its predecessor, Aho and Ullman's Principles of Compiler Design.

In particular, that third paragraph really sums it up.

(Okay, all done, I'll go back to posts about photography and Anthony...)


Comments so far....

Interesting words from one of Perl 6 documentations:

[quote]
This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them regex rather than “regular expressions” because they haven’t been regular expressions for a long time,
[/quote]

If you want to read more, see:
http://dev.perl.org/perl6/doc/design/syn/S05.html

Forum thread on which the documentation is mentioned:
http://forum.java.sun.com/thread.jspa?threadID=768745

I don’t know whether it is a good news or bad news.

— comment by Hiroshi Iwatani on September 18th, 2006 at 3:46am JST (3 years, 5 months ago) comment permalink

Man, was that a boring post. Remember that some of your readers have half of your IQ . Please go back to fun with Anthony. :)

— comment by Big mama on September 19th, 2006 at 10:59pm JST (3 years, 5 months ago) comment permalink

Some people, confronted with a problem of language abuse, say “I know what, I’ll teach them the facts.” Then they have two problems.

— comment by 匿名 on February 15th, 2008 at 6:46pm JST (2 years ago) comment permalink
— comment by Xeno on September 25th, 2009 at 3:55am JST (4 months, 16 days ago) comment permalink
Leave a comment...


All comments are invisible to others until Jeffrey approves them. Spam is never approved, and never makes it to the live site.

Please mention what part of the world you're writing from, if you don't mind. It's always interesting to see where people are visiting from.

More or less plain text — see below for allowed markup

You can use the following tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting