This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin Filesystem Performance degradation 1.7.5 vs 1.7.7, and methods for improving performance


Hi,

> To be clear: we are definitely not going to create an interface (and I
> use the term loosely) to Cygwin which requires a program to internally
> use putenv() or setenv() to change behavior.  That is just a really bad
> way to implement this type of thing since it is not how environment
> variables are meant to be used.  Environment variables are meant to be
> used to externally control the behavior of a program.

"TZ" env is an example of an env that is used inside applications to control application internal stuff.

> Consider what an application writer would have to do to get a fast stat.
> They'd have to save the current version of the Cygwin environment variable
> in a buffer and add something like " faststat" to it.  This presupposes
> that the Cygwin environment variable contains nothing but space separated
> tokens so it means that we will not be extending its format any time soon.

I can do this in 2 lines:
    { static char e[1024]; snprintf(e, sizeof(e)-1,
        "%s no_ino no_nlink", getenv("CYGWIN")); setenv("CYGWIN", e, 1); }

Cheap price to pay to make the application run way faster on Cygwin (a.k.a cygwin friendly).

And you are forgetting the power of environment var: Until the above code would be added to various applications, power users of Cygwin, which know exactly what they are doing, may be able to get this performance improvement without having to wait for the above code to be added to applications. They will be able to do it as wrappers (bash aliases/bash functions/tiny bash wrapper scripts).

For example: add to your ~/bashrc: 'alias grep=CYGWIN="no_ino no_nlink" grep' will make your life much better :-)

Or adding in your root Makefile of your huge gnumake build system: 'export CYGWIN=no_ino no_nlink' will make you a happy man, shaving off lots of time on the make process.

And as a power user - you KNOW when its ok do to this.

All this would not be possible via xstat().

> We could introduce a way to manipulate the Cygwin environment variable
> but if we did that, we might as well just implement something like
> xstat().

These are two completely different things (which do not contradict each other).

xstat() requires major changes all along the program code with lots of ifdefs to support the application to work on OSes that dont support xstat.

The env allows to do it all in one call on program init.

>> And this is exactly what Yoni Londner is trying to do: He not only
>> complained about performance: but gave a practical patch to for using
>> setenv("CYGWIN") to solve the performance problems.
> I didn't realize that Yoni Londner was a git developer.  When I do a
> google search on "londner git" I don't see many hits so I'll have to
> take your word for it.

Yoni is not a GIT developer. Yoni is a developer at Hola.
What I said is that Yoni did exactly what you proposed git developers do: write a cygwin solution and submit a patch.
Thats what he did. And it got totally rejected.
So no wonder git developers just preferred solving their own specific problem without pushing a patch to cygwin.


> There were also several problems with the patch, the top two being: 1)
> it was large enough to require a license assignment

I have already signed last week on the license agreement for Yoni/Cygwin (since he is a Hola employee), and he already got it sent last week to the Cygwin team.

> 2) it was too
> large to be accepted as is and needs to be broken up into smaller
> pieces.  You should not be characterizing this as a stubborn refusal to
> incorporate the patch since those two general points have already been
> made.  And, Corinna has had more specific issues as well.

Yoni later passed on the code with 'ifdef' split up (every feature under a specific ifdef), so its trivial to just take out of it the parts you need.

Derry

On 9/29/2010 9:02 PM, Christopher Faylor wrote:
On Wed, Sep 29, 2010 at 07:48:32PM +0200, Derry Shribman wrote:
Hi,

Right.  Another way of looking at this is that the mount options offer
consistency.  The notion of setting an environment variable in Window A
to get one behavior and not settting it in Window B is, IMO, a support
nightmare and a recipe for end-user confusion.

1) for applications that internally will set this setting there is nothing confusing on non-consistant.

To be clear: we are definitely not going to create an interface (and I use the term loosely) to Cygwin which requires a program to internally use putenv() or setenv() to change behavior. That is just a really bad way to implement this type of thing since it is not how environment variables are meant to be used. Environment variables are meant to be used to externally control the behavior of a program.

Consider what an application writer would have to do to get a fast stat.
They'd have to save the current version of the Cygwin environment variable
in a buffer and add something like " faststat" to it.  This presupposes
that the Cygwin environment variable contains nothing but space separated
tokens so it means that we will not be extending its format any time soon.

We could introduce a way to manipulate the Cygwin environment variable
but if we did that, we might as well just implement something like
xstat().

2) for users that want to set this: setting PATH can cause the same
application/bash script to behave completely differently. same goes for
LD_LIBRARY_PATH, SHELL, COMSPEC, TMPDIR etc. They can cause the same application
in different shells to behave differently. This causes confusion only for
end-users who touch things they dont understand what they do. If you dont know
what LD_LIBRARY_PATH is: dont touch it! Unix system does not try to protect
itself from ignorent end-users who touch things that they are not supposed to
touch (unlike GUI applications which try to). Nothing will protect against an
end user setting an incorrect PATH. If an end user does not know what PATH is:
he should not touch it!

You probably should let me worry about support issues. I'm not worried about esoteric things like what Unix systems try to do. I'm concerned about mailing list traffic. If we introduce something that is confusing to end users they will send email and it will increase the support burden for everyone who volunteers to work on Cygwin. I'm not against using a philsophic reason as a justification for a change but I'm damn well not going to introduce a change which causes confusion if there are other, better ways to get the job done.

Or, another way of looking at this is, instead of implementing their
own potentially buggy, imprecise stat() they could have not thought of
Cygwin as a black box and either 1) offered improvements for the DLL or
2) engaged the Cygwin community with requirements.  If there is
ifdef'ed __CYGWIN__ code in git now that means that any performance
improvements that we (i.e., Corinna) has made will never be noticed and
that code will be maintained forever.

And this is exactly what Yoni Londner is trying to do: He not only complained about performance: but gave a practical patch to for using setenv("CYGWIN") to solve the performance problems.

I didn't realize that Yoni Londner was a git developer. When I do a google search on "londner git" I don't see many hits so I'll have to take your word for it.

I am sure git developers were not happy to have to write their own
version of stat() specially for __CYGWIN__.  But it seems here that the
simple to implement setenv("CYGWIN", "no_ino no_nlink") is being
rejected without any good reason.

You're apparently ignoring the reasons which have been put forth already. If we think we can get performance improvements without requiring an environment variable then we will pursue those.

There were also several problems with the patch, the top two being: 1)
it was large enough to require a license assignment and 2) it was too
large to be accepted as is and needs to be broken up into smaller
pieces.  You should not be characterizing this as a stubborn refusal to
incorporate the patch since those two general points have already been
made.  And, Corinna has had more specific issues as well.

Also, if the project leads and other important Cygwin contributors think
that an environment variable is a bad idea then continuing to insist
that it isn't does not really advance the discussion.

So, you're trading ifdef __CYGWIN__ in git with lots of if's in the
very parts of Cygwin code path where people complain about slowness.

The slowness of the cygwin filesystem calls do not come from if()'s in Cygwin's code.

A typical CPU today can perform around 1,000,000,000 if()'s per second
(around 1 nano second per if()).  While the 'cost' of WinNT system call
is a minimum of 20,000ns, while many filesystem calls are much much
longer.

Yes, point taken. I'm familiar with this argument. People make it to me all of the time. "What's one more function call/if/for-loop? CPUs are FAST!" I think I've heard that argument hundreds, if not thousands of times.

So adding an if() to save a system call (or even 10 if()s...) - is
always worth it.

I'm not talking about adding one if. I'm talking about increasing the code complexity and slowing down, however marginally, the code path for the common use case. If/when we add xstat() we'll have to be very careful about how we do it. Hopefully, we can implement it in such a way as to avoid a tangled tree of ifs.

cgf




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]