{"id":248,"date":"2006-09-15T15:11:23","date_gmt":"2006-09-15T06:11:23","guid":{"rendered":"https:\/\/regex.info\/blog\/2006-09-15\/248"},"modified":"2006-09-15T15:11:23","modified_gmt":"2006-09-15T06:11:23","slug":"back-to-the-future-on-the-terms-nfa-dfa-and-regular-expression","status":"publish","type":"post","link":"https:\/\/regex.info\/blog\/2006-09-15\/248","title":{"rendered":"Back to the Future: On the Terms &#8220;NFA,&#8221; &#8220;DFA,&#8221; and &#8220;Regular Expression&#8221;"},"content":{"rendered":"\n<p>\n\nIn my <a href=\"\/blog\/2006-09-15\/247\">previous post<\/a> <span class='nobr'>I\ndug<\/span> up messages someone wrote almost 10 years ago, and in some places\nchallenged what they wrote. <span class='nobr'>In writing<\/span> the post, <span class='nobr'>I mostly<\/span> wanted to\ndocument the thread that created the somewhat-famous &#8220;<a\nhref=\"\/blog\/2006-09-15\/247\" class='quiet'>now you have two\nproblems<\/a>&#8221; quote, and the challenges and rebuttals <span class='nobr'>I included<\/span> are\n<span class='nobr'>a bit<\/span> unfair of me (being 10 years later, with little chance that the\nauthor would care to engage now in <span class='nobr'>a debate<\/span> about writings and technology\n10 years old).\n\n<\/p><p>\n\nSo, with that in mind I'd like to refute a <a\nhref=\"http:\/\/books.slashdot.org\/comments.pl?sid=196462&amp;cid=16100425\">comment<\/a>\nmade <b>yesterday<\/b> on the Slashdot review of my book:<\/p>\n\n<div style='margin-left:70px'>\n<small>On Sep 13, 2006, someone commented on Slashdot<\/small>\n<div style='padding-left:25px; margin-right: 50px; border-width:0; border-left: #338 solid medium;padding-top: 3px' class='bg-B'>\n<b>Don't believe the &#8220;computer science&#8221; here.<\/b>\n<p>&#8220;Nondeterministic finite automata&#8221; is well defined in comp-sci\nand Friedl has it wrong. <span class='nobr'>The set of<\/span> languages accepted by NFA's\nis exactly the same as the set accepted by DFA's.<\/p>\n\n<p>Perl's engine and its brethren use search-and-backtrack. They\naccept <span class='nobr'>a lot<\/span> things that are not regular expressions. Such\nengines don't have much theory behind them, and it's hard to\nreason generally about what they do and how fast they do it.<\/p>\n<\/div><\/div>\n\n<p>In the context of what my book covers, this is unreasonable on many\nlevels, but <span class='nobr'>I won't<\/span> respond to it now, I'll respond to it 10 years ago. <span class='nobr'>I wrote<\/span> the following in 1997, in a <a\nhref=\"http:\/\/groups.google.com\/group\/comp.lang.perl.misc\/msg\/ee87acdcf5e9f087?hl=en&\">post<\/a>\non comp.lang.perl.misc:<\/p>\n\n\n<div style='margin-left:70px; margin-top: 20px'>\n<small>On <span class='nobr'>March 20,<\/span> 1997, Jeffrey Friedl wrote:<\/small>\n<div style='padding-left:25px; margin-right: 50px; border-width:0; border-left: #338 solid medium;padding-top: 3px' class='bg-B'>\n<b>Re: term 'regular expressions' considered undesirable<\/b>\n<pre>|&gt; Perhaps it's now too late, and we need to find a new\n|&gt; term for what we used to call regular expressions.<\/pre>\n\n<p>\n\nPerhaps it's best to differentiate between the calculus phrase &#8220;regular\nexpression&#8221;, coined by Dr. Stephen Kleene in the 40s, and the use of the\nsame term in modern computer science. There are many situations in language\nwhere the meanings of words have changed. Some examples:<\/p>\n\n<ul>\n  <li>When <span class='nobr'>I was<\/span> growing up, &#8220;How are you doing?&#8221; was not <span class='nobr'>a statement.<\/span><\/li>\n\n  <li><span class='nobr'>A Japanese<\/span> futon has no relationship to what is called a &#8220;futon&#8221; in\n  American English.<\/li>\n\n  <li>Tennis shoes are rarely used for tennis.<\/li>\n\n  <li>The little eraser-head pointing devices on laptops are called mice.<\/li>\n<\/ul>\n\n<p>Ah, well, <span class='nobr'>I could<\/span> go on forever. But basically, the term &#8220;regular\nexpression&#8221; has no implied meaning to most people that first come across\nit in Perl or whatnot, so little confusion arises. <span class='nobr'>It would<\/span> have been nice\nhad some other word been used from the start (perhaps one giving more of <span class='nobr'>a\nhint<\/span> as to what the thing is all about), but for the most part we're stuck\nwith history. <span class='nobr'>I suppose<\/span> you could start <span class='nobr'>a crusade<\/span> to use <span class='nobr'>a different<\/span>\nphrase. How about 'whachamacallit'? \ud83d\ude42\n<\/p>\n\n<p><span style='margin-left: 100px'>Jeffrey<\/span><\/p>\n\n<\/div><\/div>\n\n<p>Whoever wrote the comment also wrote <a\nhref=\"http:\/\/slashdot.org\/comments.pl?sid=143298&amp;cid=12016815\"><span class='nobr'>a similar<\/span>\ncomment in March<\/a>, and believes that the reality of how things work is\nsomehow dictated by the name. <span class='nobr'>I tend to<\/span> believe that the reality of how\nthings work is dictated by, well, <b>reality<\/b>, often <i>despite<\/i> the\nname. (Names\/functions of government offices, like &#8220;Internal Revenue\nService&#8221;, are an excellent example <span class='nobr'>of this <b>\ud83d\ude42<\/b><\/span>\n\n<\/p><p>\n\nMy book is concerned with the reality of how things work, and <span class='nobr'>I make<\/span> it\nclear in the book that the reality is different than what the linguistic\nhistorical-baggage might imply to some small subset of people.\n\n<\/p><p>\n\nTo continue the special theme of this post of bringing up 10-year-old\nwritings to rebut current ignorance, here's the sidebar on page 104 of the\nfirst edition of my book (the poster having said that he has the first\nedition), the master source having been last updated on <span class='nobr'>November 4,<\/span> 1996:\n\n<\/p>\n\n<div style='margin: 10px 30px'>\nFrom <a href=\"http:\/\/regex.info\">Mastering Regular Expressions<\/a>, First Edition, page 104\n<br\/><small>(<span class='nobr'>A similar<\/span> sidebar appears on page 180 of later editions)<\/small>\n<div style='border-width: 1px; padding: 20px' class='bg-C'>\n<center><b><span class='ibm'>NFA<\/span>: Theory vs. Reality<\/b><\/center>\n\n<p>\nThe true mathematical and computational meaning of &#8220;<span class='ibm'>NFA<\/span>&#8221; is\ndifferent from what is commonly called an &#8220;<span class='ibm'>NFA<\/span> regex engine.&#8221; In\ntheory, <span class='ibm'>NFA<\/span> and <span class='ibm'>DFA<\/span> engines should match exactly the same text and have\nexactly the same features. In practice, the desire for richer, more\nexpressive regular expressions has caused their semantics to diverge. We'll\nsee several examples later in this chapter, but one right off the top is\nsupport for backreferences.<\/p><p>\n\nAs a programmer, if you have <span class='nobr'>a true<\/span> (mathematically speaking) <span class='ibm'>NFA<\/span> regex\nengine, it is <span class='nobr'>a relatively<\/span> small task to add support for backreferences. A\n<span class='ibm'>DFA<\/span>'s engine's design precludes the adding of this support, but an <span class='ibm'>NFA<\/span>'s\ncommon implementation makes it trivial. <span class='nobr'>In doing<\/span> so, you create <span class='nobr'>a more<\/span>\npowerful tool, but you also make it decidedly\n\n  <i>nonregular<\/i>\n\n(mathematically speaking). What does this mean? At most, that you should\nprobably stop calling it an <span class='ibm'>NFA<\/span>, and start using the phrase\n\n  &#8220;nonregular expressions,&#8221;\n\nsince that describes (mathematically speaking) the new situation. <span class='nobr'>No one has<\/span> actually done this, so the <i>name<\/i> &#8220;<span class='ibm'>NFA<\/span>&#8221; has lingered, even\nthough the implementation is no longer (mathematically speaking) an <span class='ibm'>NFA<\/span>.\n\n<\/p><p>\n\nWhat does all this mean to you, as <span class='nobr'>a user<\/span>? Absolutely nothing. <span class='nobr'>As a user<\/span>,\nyou don't care if it's regular,\n\n  nonregular, unregular, irregular,\n\nor incontinent. <span class='nobr'>So long as<\/span> you know what you can expect from it\n(something this chapter will show you), you know all you need to care about.<\/p><p>\n\nFor those wishing to learn more about the theory of regular expressions,\nthe classic computer-science text is chapter 3 of Aho, Sethi, and Ullman's\n\n<i>Compilers &mdash; Principles, Techniques, and Tools<\/i> (Addison-Wesley,\n1986), commonly called &#8220;The Dragon Book&#8221; due to the cover design.\nMore specifically, this is the &#8220;red dragon&#8221;. The &#8220;green dragon&#8221;\nis its predecessor, <span class='nobr'>Aho and Ullman<\/span>'s <i>Principles of Compiler Design<\/i>.<\/p>\n<\/div><\/div>\n\n<p>In particular, that third paragraph really sums it up.<\/p>\n\n<p>(Okay, all done, I'll go back to posts about photography and Anthony...)<\/p>\n \n\n","protected":false},"excerpt":{"rendered":"<p> In my previous post I dug up messages someone wrote almost 10 years ago, and in some places challenged what they wrote. In writing the post, I mostly wanted to document the thread that created the somewhat-famous \"now you have two problems\" quote, and the challenges and rebuttals I included are a bit unfair of me (being 10 years later, with little chance that the author would care to engage now in a debate about writings and technology 10 years old). <\/p><p> So, with that in mind I'd like to refute a comment made <b>yesterday<\/b> on the Slashdot review [...]","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/posts\/248"}],"collection":[{"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/comments?post=248"}],"version-history":[{"count":0,"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/posts\/248\/revisions"}],"wp:attachment":[{"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/media?parent=248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/categories?post=248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/regex.info\/blog\/wp-json\/wp\/v2\/tags?post=248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}