This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin multithreading performance


Corinna Vinschen wrote:
On Dec  8 22:10, Mark Geisert wrote:
I'll give the snapshot a try, thanks, but it's hard for me to see :) how
eliminating a few hundred locks and unlocks that are 75% under 1 usec anyway
will help much.  I may have unduly influenced anybody following this by only
showing detailed stats for the last two locks that I picked at random.

What I'm more focused on are the locks being heavily used by git and heavily
contended by the four git threads in OP's testcase.  Here are the caller
stats for this set of locks...

lock latency buckets:                            <=1   <=10  <=100 <=1000  >1000
mtx  4908/06002A00F0 lks 155066, ulks 155066,  66359   4395  78895   4742    675
   caller 0x0100454FAD, count 154792, L, /usr/src/git/builtin/pack-objects.c:1758
   caller 0x0100454FFC, count 154777, U, /usr/src/git/builtin/pack-objects.c:1769
   [...]

mtx  4908/0600FA6860 lks 154745, ulks 154745,  56092   3217  64883  25435   5118
   caller 0x0100455659, count   1011, L, /usr/src/git/builtin/pack-objects.c:1640
   caller 0x0100455687, count   1011, U, /usr/src/git/builtin/pack-objects.c:1642
   caller 0x01004556E0, count 153734, L, /usr/src/git/builtin/pack-objects.c:1628
   caller 0x010045570E, count 153734, U, /usr/src/git/builtin/pack-objects.c:1630

mtx  4908/0603498600 lks 316825, ulks 316825, 146254   4986 155345   9686    554
   caller 0x0100455269, count 196769, L, /usr/src/git/builtin/pack-objects.c:1695
   caller 0x01004552C4, count  15148, U, /usr/src/git/builtin/pack-objects.c:1705
   caller 0x0100455478, count 181621, U, /usr/src/git/builtin/pack-objects.c:1702
   caller 0x010045554C, count 120056, L, /usr/src/git/builtin/pack-objects.c:1834
   caller 0x010045556E, count 120056, U, /usr/src/git/builtin/pack-objects.c:1837

If you look at the ">1000" column, you see that these 3 locks account for
675+5118+554 = 6347ms = 6.347sec spent waiting for these locks, best case
for just this 1000+ usec latency bucket.  The next column to the left (a
faster bucket for latencies 101us up to 1000us) has 4742+25435+9686 = 39863,
times 101us best case for this bucket, giving 4.026sec spent waiting for
these locks.

I probably need to modify my stats gathering program to display the total
time spent waiting to make it more clear how things add up for each mutex
across the whole testcase.  Again, these stats are not timing how long locks
are being held, but rather how long it takes just to acquire the lock.

But acquiring a lock includes the time it actually has to wait for
another thread holding the lock so you're measuring *also* the time
locks are hold, just by another thread.  locking and unlocking a
pthread_mutex in Cygwin might be slower than using a critical section,
but it's not *that* slow.

The underlying point is, mutexes are not used as an end in itself.  They
guard a shared resource.  What we ultimately need to know here is

1. What resource is it the pthread_mutexes are guarding?

2. Why does accessing the resource take so long that threads have to wait
    so long?

Silly me; I assumed they were guarding short runs of instructions that would take the same amount of time on MinGW vs Cygwin, thus differences must be in the locking implementations. That does seem to be the case for the 3rd lock listed above, a guard around cache accounting that doesn't do any I/O. But the 1st lock guards display_progress() (how time-consuming could *that* be?) and the 2nd lock guards read_sha1_file() which sounds like it involves I/O so I need to dig deeper there.

..mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]