This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file


Christopher Faylor wrote:
On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
aputerguy wrote:
Running grep on a 20MB file with ~100,000 matches takes an incredible almost
8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
(on a 2nd machine).
I've seen nasty behavior with grep that isnt' cygwin specific.  Try
"pcregrep" and see if you have the same issue.

I found it to be about ~100 times faster under _some_ searches though
2-3x is more typical.  The gnu re-parser isn't real efficient under
some circumstances.

If you find a big difference, you might also want to report it to the
bug-grep@gnu.org mailing list, but last time I did, they told me
"that's the way it is" due to some posix conformance thing...

The fact that it behaves differently between Cygwin 1.5 and 1.7 would
suggest that this isn't a grep problem.
This is likely to be triggered by the transition to UTF-8 as a default charset. The same problem is observed on Linux, with grep as well as with sed.
That's why I have changed most of my shell scripts to use something like
LC_ALL=C grep or LC_ALL=C sed
where possible. Please try this.


The problem *is* with grep (and sed), however, because there is no good reason that UTF-8 should give us a penalty of being 100times slower on most search operations, this is just poor programming of grep and sed.

Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]