In my previous post I dug up messages someone wrote almost 10 years ago, and in some places challenged what they wrote. In writing the post, I mostly wanted to document the thread that created the somewhat-famous “now you have two problems” quote, and the challenges and rebuttals I included are a bit unfair of me (being 10 years later, with little chance that the author would care to engage now in a debate about writings and technology 10 years old).
So, with that in mind I'd like to refute a comment made yesterday on the Slashdot review of my book:
“Nondeterministic finite automata” is well defined in comp-sci and Friedl has it wrong. The set of languages accepted by NFA's is exactly the same as the set accepted by DFA's.
Perl's engine and its brethren use search-and-backtrack. They accept a lot things that are not regular expressions. Such engines don't have much theory behind them, and it's hard to reason generally about what they do and how fast they do it.
In the context of what my book covers, this is unreasonable on many levels, but I won't respond to it now, I'll respond to it 10 years ago. I wrote the following in 1997, in a post on comp.lang.perl.misc:
|> Perhaps it's now too late, and we need to find a new |> term for what we used to call regular expressions.
Perhaps it's best to differentiate between the calculus phrase “regular expression”, coined by Dr. Stephen Kleene in the 40s, and the use of the same term in modern computer science. There are many situations in language where the meanings of words have changed. Some examples:
- When I was growing up, “How are you doing?” was not a statement.
- A Japanese futon has no relationship to what is called a “futon” in American English.
- Tennis shoes are rarely used for tennis.
- The little eraser-head pointing devices on laptops are called mice.
Ah, well, I could go on forever. But basically, the term “regular
expression” has no implied meaning to most people that first come across
it in Perl or whatnot, so little confusion arises. It would have been nice
had some other word been used from the start (perhaps one giving more of a
hint as to what the thing is all about), but for the most part we're stuck
with history. I suppose you could start a crusade to use a different
phrase. How about 'whachamacallit'?
Jeffrey
Whoever wrote the comment also wrote a similar
comment in March, and believes that the reality of how things work is
somehow dictated by the name. I tend to believe that the reality of how
things work is dictated by, well, reality, often despite the
name. (Names/functions of government offices, like “Internal Revenue
Service”, are an excellent example of this
My book is concerned with the reality of how things work, and I make it clear in the book that the reality is different than what the linguistic historical-baggage might imply to some small subset of people.
To continue the special theme of this post of bringing up 10-year-old writings to rebut current ignorance, here's the sidebar on page 104 of the first edition of my book (the poster having said that he has the first edition), the master source having been last updated on November 4, 1996:
(A similar sidebar appears on page 180 of later editions)
The true mathematical and computational meaning of “NFA” is different from what is commonly called an “NFA regex engine.” In theory, NFA and DFA engines should match exactly the same text and have exactly the same features. In practice, the desire for richer, more expressive regular expressions has caused their semantics to diverge. We'll see several examples later in this chapter, but one right off the top is support for backreferences.
As a programmer, if you have a true (mathematically speaking) NFA regex engine, it is a relatively small task to add support for backreferences. A DFA's engine's design precludes the adding of this support, but an NFA's common implementation makes it trivial. In doing so, you create a more powerful tool, but you also make it decidedly nonregular (mathematically speaking). What does this mean? At most, that you should probably stop calling it an NFA, and start using the phrase “nonregular expressions,” since that describes (mathematically speaking) the new situation. No one has actually done this, so the name “NFA” has lingered, even though the implementation is no longer (mathematically speaking) an NFA.
What does all this mean to you, as a user? Absolutely nothing. As a user, you don't care if it's regular, nonregular, unregular, irregular, or incontinent. So long as you know what you can expect from it (something this chapter will show you), you know all you need to care about.
For those wishing to learn more about the theory of regular expressions, the classic computer-science text is chapter 3 of Aho, Sethi, and Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley, 1986), commonly called “The Dragon Book” due to the cover design. More specifically, this is the “red dragon”. The “green dragon” is its predecessor, Aho and Ullman's Principles of Compiler Design.
In particular, that third paragraph really sums it up.
(Okay, all done, I'll go back to posts about photography and Anthony...)
Interesting words from one of Perl 6 documentations:
[quote]
This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them regex rather than “regular expressions” because they haven’t been regular expressions for a long time,
[/quote]
If you want to read more, see:
http://dev.perl.org/perl6/doc/design/syn/S05.html
Forum thread on which the documentation is mentioned:
http://forum.java.sun.com/thread.jspa?threadID=768745
I don’t know whether it is a good news or bad news.
Man, was that a boring post. Remember that some of your readers have half of your IQ . Please go back to fun with Anthony.
Some people, confronted with a problem of language abuse, say “I know what, I’ll teach them the facts.” Then they have two problems.
Hmmmm…
http://swtch.com/~rsc/regexp/regexp1.html