<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Back to the Future: On the Terms &#8220;NFA,&#8221; &#8220;DFA,&#8221; and &#8220;Regular Expression&#8221;</title>
	<atom:link href="http://regex.info/blog/2006-09-15/248/feed" rel="self" type="application/rss+xml" />
	<link>http://regex.info/blog/2006-09-15/248</link>
	<description>Not a photo blog. A personal blog with photos.</description>
	<lastBuildDate>Wed, 23 May 2012 20:38:13 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Bart</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-42839</link>
		<dc:creator>Bart</dc:creator>
		<pubDate>Thu, 28 Apr 2011 03:40:52 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-42839</guid>
		<description>The problem with all the confusion documented above is that the users of &quot;almost-regular
expressions&quot; (AREs) do, or at least should, sometimes care about efficiency; 
good algorithms and good advice will give them
a chance of meeting their efficiency needs. Computer scientists who
have worked (and still work---this research did not stop in the 1980s) to figure out efficient
algorithms for deciding whether a string matches an ARE have also given performance
guarantees along with these algorithms, particularized by properties of the specific ARE instance.

If the user needs a hard guarantee of not-horrible performance, they need to know that they
should use a matcher
that can use  a non-backtracking algorithm, and need to know what
properties of an ARE will ensure a non-backtracking match. Fortunately, most AREs can be
matched in a non-backtracking way (AREs containing  backreferences being the most prominent exception).

If the user is in a situation where it is OK for the matcher to simply run out of time or memory
and fail, they can use whatever matcher they desire---and indeed, most backtracking matchers perform
acceptably for most AREs and input strings. The problem is that an ARE that works perfectly fine
with a backtracking matcher on most input strings may well suddenly blow up in time or space
when presented with just the &quot;wrong&quot; one. The user who hasn&#039;t anticipated and planned for
this possibility will find themselves in a bad situation.

CS researchers now know how to build matchers that perform at least as well as backtracking
matchers on all inputs, that implement the same semantics as backtracking matchers,
and that run in time proportional only to the length of the input string for a wide and predictable
variety of AREs. These matchers are not terribly difficult to build, and source code is available. It would
behoove those choosing
ARE matchers to choose that kind: there is no downside.  Users of ARE-based software that
care about performance  need to
be aware of which matching algorithm is being used, and to control what sort of ARE and/or input
string is fed to the matcher so that they get the performance guarantees they need.

&lt;span class=&#039;jfriedl&#039;&gt;Efficiency is a running theme through the whole book, because as I wrote (in the quote from the book, cited above),  &quot;&lt;i&gt;So long as you know what you can expect from it (something this chapter will show you), you know all you need to care about.&lt;/i&gt;&quot;. There are plenty of cases for which an NFA that uses backtracking is order(N) on the length of the string, and cases, too, where changing one character of the regex turns it into a never-ending computational bomb. Knowing what you can expect from a regex engine encompasses all that. &#8212;Jeffrey&lt;/span&gt;</description>
		<content:encoded><![CDATA[<p>The problem with all the confusion documented above is that the users of &#8220;almost-regular<br />
expressions&#8221; (AREs) do, or at least should, sometimes care about efficiency;<br />
good algorithms and good advice will give them<br />
a chance of meeting their efficiency needs. Computer scientists who<br />
have worked (and still work&#8212;this research did not stop in the 1980s) to figure out efficient<br />
algorithms for deciding whether a string matches an ARE have also given performance<br />
guarantees along with these algorithms, particularized by properties of the specific ARE instance.</p>
<p>If the user needs a hard guarantee of not-horrible performance, they need to know that they<br />
should use a matcher<br />
that can use  a non-backtracking algorithm, and need to know what<br />
properties of an ARE will ensure a non-backtracking match. Fortunately, most AREs can be<br />
matched in a non-backtracking way (AREs containing  backreferences being the most prominent exception).</p>
<p>If the user is in a situation where it is OK for the matcher to simply run out of time or memory<br />
and fail, they can use whatever matcher they desire&#8212;and indeed, most backtracking matchers perform<br />
acceptably for most AREs and input strings. The problem is that an ARE that works perfectly fine<br />
with a backtracking matcher on most input strings may well suddenly blow up in time or space<br />
when presented with just the &#8220;wrong&#8221; one. The user who hasn&#8217;t anticipated and planned for<br />
this possibility will find themselves in a bad situation.</p>
<p>CS researchers now know how to build matchers that perform at least as well as backtracking<br />
matchers on all inputs, that implement the same semantics as backtracking matchers,<br />
and that run in time proportional only to the length of the input string for a wide and predictable<br />
variety of AREs. These matchers are not terribly difficult to build, and source code is available. It would<br />
behoove those choosing<br />
ARE matchers to choose that kind: there is no downside.  Users of ARE-based software that<br />
care about performance  need to<br />
be aware of which matching algorithm is being used, and to control what sort of ARE and/or input<br />
string is fed to the matcher so that they get the performance guarantees they need.</p>
<p><span class='jfriedl'>Efficiency is a running theme through the whole book, because as I wrote (in the quote from the book, cited above),  &#8220;<i>So long as you know what you can expect from it (something this chapter will show you), you know all you need to care about.</i>&#8220;. There are plenty of cases for which an NFA that uses backtracking is order(N) on the length of the string, and cases, too, where changing one character of the regex turns it into a never-ending computational bomb. Knowing what you can expect from a regex engine encompasses all that. &mdash;Jeffrey</span></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-42268</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Sun, 06 Mar 2011 18:38:56 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-42268</guid>
		<description>Pierre misses the point.

It&#039;s not that the regular expression that Russ Cox used in his article cannot be transformed (it can be, and trivially at that).  Rather, the point is that when using NDFAs and DFAs, the type of exponential blow up that happens in Perl&#039;s backtracking implementation doesn&#039;t happen.  Indeed, it cannot happen.

Jeffrey is right in that the end user shouldn&#039;t necessarily have to be aware of the underlying theory in order to use the implementation (though it couldn&#039;t hurt).  However, that doesn&#039;t mean that it&#039;s okay to abuse the theory by calling it something that it most definitely is not.  Words have meaning, terms have definitions, and just because you choose to ignore them doesn&#039;t make you correct.  Millions of people say &quot;ain&#039;t&quot; every day; that doesn&#039;t make it proper English.

The fact is that, as much as practitioners may abuse them, these terms have formal definitions.  Those definitions don&#039;t need to, nor will they, change just because Perl screwed it up by making regular expressions not regular anymore.  It&#039;s really not the computer scientists fault.  It&#039;s Larry Wall&#039;s (or maybe Henry Spencer&#039;s) fault for not understanding the theory in the first place.

If anything, this causes greater confusion because someone who&#039;s curious about the underlying theory may go read up on it, and then not understand why their code doesn&#039;t work the same way.  This contributes to the illusion that the theory is impenetrable and irrelevant.  It may also cause a programmer to charge off, confidently assuming that his or her usage of regular expressions is correct and guaranteed to run in time linear in the size of the string being parsed with bounded memory usage, only to find that the thing actually blows up.

A much more sensible approach would be to just call a spade a spade: Perl et al don&#039;t use NFAs.  They really don&#039;t, despite what people keep insisting.  So, simply say what they do do: they use backtracking, which is subject to exponential run times and memory consumption.  There&#039;s nothing wrong with that, as long as people know that&#039;s what is happening and plan accordingly.  So just tell them, so that they can be better prepared to make use of the tools that the language provides because they understand the potential consequences.</description>
		<content:encoded><![CDATA[<p>Pierre misses the point.</p>
<p>It&#8217;s not that the regular expression that Russ Cox used in his article cannot be transformed (it can be, and trivially at that).  Rather, the point is that when using NDFAs and DFAs, the type of exponential blow up that happens in Perl&#8217;s backtracking implementation doesn&#8217;t happen.  Indeed, it cannot happen.</p>
<p>Jeffrey is right in that the end user shouldn&#8217;t necessarily have to be aware of the underlying theory in order to use the implementation (though it couldn&#8217;t hurt).  However, that doesn&#8217;t mean that it&#8217;s okay to abuse the theory by calling it something that it most definitely is not.  Words have meaning, terms have definitions, and just because you choose to ignore them doesn&#8217;t make you correct.  Millions of people say &#8220;ain&#8217;t&#8221; every day; that doesn&#8217;t make it proper English.</p>
<p>The fact is that, as much as practitioners may abuse them, these terms have formal definitions.  Those definitions don&#8217;t need to, nor will they, change just because Perl screwed it up by making regular expressions not regular anymore.  It&#8217;s really not the computer scientists fault.  It&#8217;s Larry Wall&#8217;s (or maybe Henry Spencer&#8217;s) fault for not understanding the theory in the first place.</p>
<p>If anything, this causes greater confusion because someone who&#8217;s curious about the underlying theory may go read up on it, and then not understand why their code doesn&#8217;t work the same way.  This contributes to the illusion that the theory is impenetrable and irrelevant.  It may also cause a programmer to charge off, confidently assuming that his or her usage of regular expressions is correct and guaranteed to run in time linear in the size of the string being parsed with bounded memory usage, only to find that the thing actually blows up.</p>
<p>A much more sensible approach would be to just call a spade a spade: Perl et al don&#8217;t use NFAs.  They really don&#8217;t, despite what people keep insisting.  So, simply say what they do do: they use backtracking, which is subject to exponential run times and memory consumption.  There&#8217;s nothing wrong with that, as long as people know that&#8217;s what is happening and plan accordingly.  So just tell them, so that they can be better prepared to make use of the tools that the language provides because they understand the potential consequences.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pierre</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-40692</link>
		<dc:creator>Pierre</dc:creator>
		<pubDate>Tue, 07 Sep 2010 21:24:03 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-40692</guid>
		<description>NFA DFA pain. The name of the engine does not matter. Every engine implementation can have a different name, and compromizes between strict NFA , strict DFA, strict hybrid fa, history-based fa, non-amnesia-fa.... Quantum-fa soon in 70 years time.  The family of FAs is growing very fast.  

http://swtch.com/~rsc/regexp/regexp1.html is stupidely based on a cornercase that any formal regex analyser would solve in microseconds: (a?){20}a{20} is better written as a{20}(a?){20}   (when the nfa meets the dfa .... ). Backtracking explosion is gone, There is also a load of formal negated logic which can be applied to reduce similar linear and quadratic explosions (even without using assertions). Identically, a lot of backscratching is prevented in engines which detect &#124; as a xor, i.e. exclusive alternations, or can scan backwards when necessary.
Can even implement a few fail transitions in a NFA and then rebrand it more-deterministic-less-stupid-nfa. That would be a good compromize.  Would any user care about it ?

Jeffrey, you are right against the theoricists and talebans of the FA classification as frozen in the 60&#039;s  (half a century ago).</description>
		<content:encoded><![CDATA[<p>NFA DFA pain. The name of the engine does not matter. Every engine implementation can have a different name, and compromizes between strict NFA , strict DFA, strict hybrid fa, history-based fa, non-amnesia-fa&#8230;. Quantum-fa soon in 70 years time.  The family of FAs is growing very fast.  </p>
<p><a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">http://swtch.com/~rsc/regexp/regexp1.html</a> is stupidely based on a cornercase that any formal regex analyser would solve in microseconds: (a?){20}a{20} is better written as a{20}(a?){20}   (when the nfa meets the dfa &#8230;. ). Backtracking explosion is gone, There is also a load of formal negated logic which can be applied to reduce similar linear and quadratic explosions (even without using assertions). Identically, a lot of backscratching is prevented in engines which detect | as a xor, i.e. exclusive alternations, or can scan backwards when necessary.<br />
Can even implement a few fail transitions in a NFA and then rebrand it more-deterministic-less-stupid-nfa. That would be a good compromize.  Would any user care about it ?</p>
<p>Jeffrey, you are right against the theoricists and talebans of the FA classification as frozen in the 60&#8242;s  (half a century ago).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xeno</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-36718</link>
		<dc:creator>Xeno</dc:creator>
		<pubDate>Thu, 24 Sep 2009 18:55:26 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-36718</guid>
		<description>Hmmmm... 
http://swtch.com/~rsc/regexp/regexp1.html</description>
		<content:encoded><![CDATA[<p>Hmmmm&#8230;<br />
<a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">http://swtch.com/~rsc/regexp/regexp1.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 匿名</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-14505</link>
		<dc:creator>匿名</dc:creator>
		<pubDate>Fri, 15 Feb 2008 09:46:29 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-14505</guid>
		<description>Some people, confronted with a problem of language abuse, say &quot;I know what, I&#039;ll teach them the facts.&quot;  Then they have two problems.</description>
		<content:encoded><![CDATA[<p>Some people, confronted with a problem of language abuse, say &#8220;I know what, I&#8217;ll teach them the facts.&#8221;  Then they have two problems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Big mama</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-2478</link>
		<dc:creator>Big mama</dc:creator>
		<pubDate>Tue, 19 Sep 2006 13:59:12 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-2478</guid>
		<description>Man, was that a boring post. Remember that some of your readers have half of your IQ . Please go back to fun with Anthony. :)</description>
		<content:encoded><![CDATA[<p>Man, was that a boring post. Remember that some of your readers have half of your IQ . Please go back to fun with Anthony. <img src='http://regex.info/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hiroshi Iwatani</title>
		<link>http://regex.info/blog/2006-09-15/248#comment-2470</link>
		<dc:creator>Hiroshi Iwatani</dc:creator>
		<pubDate>Sun, 17 Sep 2006 18:46:45 +0000</pubDate>
		<guid isPermaLink="false">http://regex.info/blog/2006-09-15/248#comment-2470</guid>
		<description>Interesting words from one of Perl 6 documentations:

[quote]
This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them regex rather than &quot;regular expressions&quot; because they haven&#039;t been regular expressions for a long time, 
[/quote]

If you want to read more, see:
http://dev.perl.org/perl6/doc/design/syn/S05.html

Forum thread on which the documentation is mentioned:
http://forum.java.sun.com/thread.jspa?threadID=768745

I don&#039;t know whether it is a good news or bad news.</description>
		<content:encoded><![CDATA[<p>Interesting words from one of Perl 6 documentations:</p>
<p>[quote]<br />
This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them regex rather than &#8220;regular expressions&#8221; because they haven&#8217;t been regular expressions for a long time,<br />
[/quote]</p>
<p>If you want to read more, see:<br />
<a href="http://dev.perl.org/perl6/doc/design/syn/S05.html" rel="nofollow">http://dev.perl.org/perl6/doc/design/syn/S05.html</a></p>
<p>Forum thread on which the documentation is mentioned:<br />
<a href="http://forum.java.sun.com/thread.jspa?threadID=768745" rel="nofollow">http://forum.java.sun.com/thread.jspa?threadID=768745</a></p>
<p>I don&#8217;t know whether it is a good news or bad news.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

