This is the mail archive of the
cygwin-patches
mailing list for the Cygwin project.
Re: PING: fix ARG_MAX
On Tue, Sep 20, 2005 at 06:43:20AM -0600, Eric Blake wrote:
>According to Christopher Faylor on 9/19/2005 8:31 AM:
>>If this is really true, then the findutils configury should be
>>attempting some kind of timing which finds that magic point where it
>>should be ignoring _SC_ARG_MAX. It shouldn't be vaguely assuming that
>>it is in its best interests to ignore it because someone thinks that
>>the cost of processing each argument outweighs the benefits of forking
>>fewer tests.
>
>POSIX allows xargs to have a default size (currently, xargs defaults to
>128k unless otherwise constrained by _SC_ARG_MAX), and that -s can
>change that size to anything within the range permitted by _SC_ARG_MAX.
AFAICT, we're not talking about defaults. We're talking about the
optimum setting.
Your change to xargs doesn't permit me to go beyond 32K. Personally,
I'd like to be able to override that.
>>Given that cost of forking is much more expensive on cygwin than on
>>other systems I really don't see how you can use this argument anyway
>>and, IMO, it doesn't make much sense on standard UNIX either. If you
>>create more processes via fork you are invoking the OS and incurring
>>context switches. You're still processing the same number of arguments
>>but you're just going to the OS to handle them more often. I don't see
>>how that's ever a win.
>
>In isolation, no. But it is what else you are doing with the arguments
>- the text processing of xargs to parse it into chunks, and the invoked
>utility's processing of its argv, that also consumes time. Also, lots
>of data tends to imply more page faults, which can be as expensive as
>context switches anyways.
Context switches also imply page faults.
>> I'm willing to be proven wrong by hard data but I think that you and the
>> findutils mailing list shouldn't be making assumptions without data to
>> back them up.
>
>Did you not read the thread on bug-findutils? Bob Proulx proposed a test
>that shows that there is NO MEASURABLE DIFFERENCE between a simple xargs
>beyond a certain -s:
>http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00038.html
No, I didn't read a thread in another mailing list. Thank you for
providing references.
>Then I repeated the test on cygwin, and found similar results:
>http://lists.gnu.org/archive/html/bug-findutils/2005-09/msg00039.html
>
>There comes a point, where even when all xargs is doing is invoking echo,
>that the cost of passing that much information through pipes does overtake
>the cost of forks.
I have a similar test which shows noticeable improvement when going from
32K to 64K and miniscule-but-still-there improvements after that:
#!/bin/sh
export TIMEFORMAT='real %3lR user %3lU sys %3lS'
for i in 20480 32768 65536 131072 262144 524288 1048576 2097152 4194304; do
time /bin/bash -c "/bin/head -n150000 /tmp/files | /bin/xargs -s$i echo >/dev/null"
done
timing 20480: real 0m12.448s user 0m18.408s sys 0m7.223s
timing 32768: real 0m8.448s user 0m12.811s sys 0m4.890s
timing 65536: real 0m5.191s user 0m8.472s sys 0m3.085s
timing 131072: real 0m4.318s user 0m5.908s sys 0m1.665s
timing 262144: real 0m3.833s user 0m4.841s sys 0m1.213s
timing 524288: real 0m3.566s user 0m3.900s sys 0m1.078s
timing 1048576: real 0m3.478s user 0m3.564s sys 0m0.665s
timing 2097152: real 0m3.417s user 0m3.039s sys 0m0.821s
timing 4194304: real 0m3.395s user 0m3.370s sys 0m0.823s
/tmp/files is the output of 'find /' on my system.
I prefer my test because it measures the clock time of the entire
operation rather than just the amount of time taken by xargs. YMMV.
What I think you can take away from this is that you can't make
assumptions about an optimal size that will work for every system.
>However, I am also keen on providing a more reasonable -s behavior in
>xargs. If cygwin were to have pathconf(filename, _PC_ARG_MAX), where a
>PATH search were done when filename does not contain '/', then pathconf
>could return 32k on Windows processes, and unlimited (or an actual known
>limit) for cygwin processes, so that xargs can then allow unlimited -s
>sizes for cygwin processes but cap windows processes at 32k and never
>encounter the E2BIG.
I am not really interested in providing a non-standard interface which
would ultimately end up being used just by xargs. That would mean that
we're adding an interface to cygwin so that a UNIX program could work
better with non-cygwin programs. I think I've been pretty consistent in
stating that I want to encumber cygwin as little as possible when it
comes to accommodating non-cygwin programs.
If you want to keep the 32K limit, that's ok with me. I'd just ask that
you make it possible to override it.
But, then, I suspect that this wasn't overrideable when I was providing
xargs either so you can feel free to ignore my request.
cgf