In my previous post I
dug up messages someone wrote almost 10 years ago, and in some places
challenged what they wrote. In writing the post, I mostly wanted to
document the thread that created the somewhat-famous “now you have two
problems” quote, and the challenges and rebuttals I included are
a bit unfair of me (being 10 years later, with little chance that the
author would care to engage now in a debate about writings and technology
10 years old).
So, with that in mind I'd like to refute a comment
made yesterday on the Slashdot review of my book:
On Sep 13, 2006, someone commented on Slashdot
Don't believe the “computer science” here.
“Nondeterministic finite automata” is well defined in comp-sci
and Friedl has it wrong. The set of languages accepted by NFA's
is exactly the same as the set accepted by DFA's.
Perl's engine and its brethren use search-and-backtrack. They
accept a lot things that are not regular expressions. Such
engines don't have much theory behind them, and it's hard to
reason generally about what they do and how fast they do it.
In the context of what my book covers, this is unreasonable on many
levels, but I won't respond to it now, I'll respond to it 10 years ago. I wrote the following in 1997, in a post
On March 20, 1997, Jeffrey Friedl wrote:
Re: term 'regular expressions' considered undesirable
|> Perhaps it's now too late, and we need to find a new
|> term for what we used to call regular expressions.
Perhaps it's best to differentiate between the calculus phrase “regular
expression”, coined by Dr. Stephen Kleene in the 40s, and the use of the
same term in modern computer science. There are many situations in language
where the meanings of words have changed. Some examples:
- When I was growing up, “How are you doing?” was not a statement.
- A Japanese futon has no relationship to what is called a “futon” in
- Tennis shoes are rarely used for tennis.
- The little eraser-head pointing devices on laptops are called mice.
Ah, well, I could go on forever. But basically, the term “regular
expression” has no implied meaning to most people that first come across
it in Perl or whatnot, so little confusion arises. It would have been nice
had some other word been used from the start (perhaps one giving more of a
hint as to what the thing is all about), but for the most part we're stuck
with history. I suppose you could start a crusade to use a different
phrase. How about 'whachamacallit'? 🙂
Whoever wrote the comment also wrote a similar
comment in March, and believes that the reality of how things work is
somehow dictated by the name. I tend to believe that the reality of how
things work is dictated by, well, reality, often despite the
name. (Names/functions of government offices, like “Internal Revenue
Service”, are an excellent example of this 🙂
My book is concerned with the reality of how things work, and I make it
clear in the book that the reality is different than what the linguistic
historical-baggage might imply to some small subset of people.
To continue the special theme of this post of bringing up 10-year-old
writings to rebut current ignorance, here's the sidebar on page 104 of the
first edition of my book (the poster having said that he has the first
edition), the master source having been last updated on November 4, 1996:
From Mastering Regular Expressions
, First Edition, page 104
(A similar sidebar appears on page 180 of later editions)
NFA: Theory vs. Reality
The true mathematical and computational meaning of “NFA” is
different from what is commonly called an “NFA regex engine.” In
theory, NFA and DFA engines should match exactly the same text and have
exactly the same features. In practice, the desire for richer, more
expressive regular expressions has caused their semantics to diverge. We'll
see several examples later in this chapter, but one right off the top is
support for backreferences.
As a programmer, if you have a true (mathematically speaking) NFA regex
engine, it is a relatively small task to add support for backreferences. A
DFA's engine's design precludes the adding of this support, but an NFA's
common implementation makes it trivial. In doing so, you create a more
powerful tool, but you also make it decidedly
(mathematically speaking). What does this mean? At most, that you should
probably stop calling it an NFA, and start using the phrase
since that describes (mathematically speaking) the new situation. No one has actually done this, so the name “NFA” has lingered, even
though the implementation is no longer (mathematically speaking) an NFA.
What does all this mean to you, as a user? Absolutely nothing. As a user,
you don't care if it's regular,
nonregular, unregular, irregular,
or incontinent. So long as you know what you can expect from it
(something this chapter will show you), you know all you need to care about.
For those wishing to learn more about the theory of regular expressions,
the classic computer-science text is chapter 3 of Aho, Sethi, and Ullman's
Compilers — Principles, Techniques, and Tools (Addison-Wesley,
1986), commonly called “The Dragon Book” due to the cover design.
More specifically, this is the “red dragon”. The “green dragon”
is its predecessor, Aho and Ullman's Principles of Compiler Design.
In particular, that third paragraph really sums it up.
(Okay, all done, I'll go back to posts about photography and Anthony...)