This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: Bad codegen in pthread_mutex causing 100% cpu spin loop related to inline asm ilockcmpexchg
- From: Dave Korn <dave dot korn dot cygwin at googlemail dot com>
- To: Dave Korn <dave dot korn dot cygwin at googlemail dot com>
- Cc: cygwin-developers at cygwin dot com
- Date: Thu, 28 May 2009 19:50:03 +0100
- Subject: Re: Bad codegen in pthread_mutex causing 100% cpu spin loop related to inline asm ilockcmpexchg
- References: <4A1EC4BC.9070104@gmail.com>
Dave Korn wrote:
> extern __inline__ long
> ilockcmpexch (volatile long *t, long v, long c)
> {
> register int __res;
> __asm__ __volatile__ ("\n\
> lock cmpxchgl %3,(%1)\n\
> ": "=a" (__res), "=q" (*t) : "1" (t), "q" (v), "0" (c): "memory", "cc");
> return __res;
> }
I don't really like the look of this:
> "=q" (*t) : "1" (t),
it seems a bit dodgy and contradictory.
This new version changes the approach to use an "m" constraint to refer
directly to the contents of *t, and not hope the compiler can infer the
relationship between the address of t in operand 2 and the content of t in
operand 1. It requires rewriting the instruction template, and will generate
more different addressing modes than the original, which would only create
register-indirect addressing:
extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
register int __res;
__asm__ __volatile__ ("\n\
lock cmpxchgl %2,%1\n\
": "=a" (__res), "+m" (*t) : "q" (v), "0" (c) : "memory", "cc");
return __res;
}
I haven't checked yet if cmpxchgl will accept all the addressing modes that
a "m" constraint can allow, but at least in this context it seems to produce
better generated code:
.p2align 4,,7
L186:
LBB1355:
LBB1356:
LBB1357:
LBB1358:
.loc 3 127 0
movl __ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.28606
movl %eax, 36(%ebx) # D.28606, <variable>.next
LBB1359:
LBB1360:
LBB1361:
.loc 2 53 0
/APP
# 53 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
lock cmpxchgl %ebx,__ZN13pthread_mutex7mutexesE+8 # this,
# 0 "" 2
LVL137:
/NO_APP
LBE1361:
LBE1360:
LBE1359:
.loc 3 126 0
cmpl %eax, 36(%ebx) # __res, <variable>.next
jne L186 #,
cheers,
DaveK