This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug? wcsxfrm causing memory corruption


On Sun, May 21, 2017 at 6:23 AM, Duncan Roe wrote:
> On Wed, May 10, 2017 at 11:30:46AM +0200, Erik Bray wrote:
>> Greetings--
>>
>> In the process of fixing the Python test suite on Cygwin I ran across
>> one test that was consistently causing segfaults later on, not
>> directly local to that test.  The test involves wcsxfrm so that's
>> where I focused my attention.
>>
>> The attached test demonstrates the bug.  Given an output buffer of N
>> wide characters, wcsxfrm will cause bytes beyond the destination size
>> to be reversed. I believe it might actually be a bug in the underlying
>> LCMapStringW workhorse (this is on Windows 10; have not tested other
>> versions).
>>
>> According to its docs [1], the cchDest argument (size of the
>> destination buffer) is treated as a *byte* count when using
>> LCMAP_SORTKEY.  However, for the purposes of applying the
>> LCMAP_BYTEREV transformation it seems to be treating the output size
>> (in bytes) as character count.  So in the example I give, where the
>> output sort key is 7 bytes (including the null terminator), it swaps
>> *14* bytes--the bytes including the sort key as well as the next 7
>> adjacent bytes.  This is obviously a problem if the destination buffer
>> is allocated out of some larger memory pool.
>>
>> This definitely has to be a bug, right?  Or at least very poorly
>> documented on MS's part.  A workaround would either be to not use
>> LCMAP_BYTEREV and just swap the bytes manually, or in a second call to
>> LCMapStringW with LCMAP_BYTEREV and the correct character count...
>>
>> Thanks,
>> Erik
>>
>>
>> [1] https://msdn.microsoft.com/en-us/library/windows/desktop/dd318700(v=vs.85).aspx
>
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <locale.h>
>> #include <wchar.h>
>> #include <string.h>
>> #include <windows.h>
>>
>> #define SIZE 32
>>
>>
>> void fill_bytes(uint8_t *a, int n) {
>>     int idx;
>>     for (idx=0; idx<n; idx++) {
>>         a[idx] = idx;
>>     }
>> }
>>
>>
>> void print_bytes(uint8_t *a, int n) {
>>     int idx;
>>     for (idx=0; idx<n; idx++) {
>>         printf("0x%02x ", ((uint8_t*)a)[idx]);
>>         if ((idx + 1) % 8 == 0) printf("\n");
>>     }
>> }
>>
>> int main(void) {
>>     wchar_t *a, *b;
>>     uint8_t *aa;
>>     size_t ret;
>>     LCID collate_lcid;
>>     int idx;
>>     collate_lcid = 1033;
>>     b = L"b";
>>     a = (wchar_t*) malloc(SIZE);
>>     aa = (uint8_t*) a;
>>
>>     setlocale(LC_ALL, "en_US.UTF-8");
>>
>>     printf("using wcsxfrm:\n");
>>     fill_bytes(aa, SIZE);
>>     printf("before:\n");
>>     print_bytes(aa, SIZE);
>>     ret = wcsxfrm(a, b, 4);
>>     printf("after (%d):\n", ret);
>>     print_bytes(aa, SIZE);
>>
>>     printf("\nusing LCMapStringW directly:\n");
>>     fill_bytes(aa, SIZE);
>>     printf("before:\n");
>>     print_bytes(aa, SIZE);
>>
>>     ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY | LCMAP_BYTEREV, b, -1, a, 8);
>>     printf("after (%d):\n", ret);
>>     print_bytes(aa, SIZE);
>>
>>     printf("\nwithout LCMAP_BYTEREV:\n");
>>     fill_bytes(aa, SIZE);
>>     printf("before:\n");
>>     print_bytes(aa, SIZE);
>>
>>     ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY, b, -1, a, 8);
>>     printf("after (%d):\n", ret);
>>     print_bytes(aa, SIZE);
>>     free(a);
>>
>>     return 0;
>> }
>
> Hi Erik,
>
> I get
>
> using wcsxfrm:
> before:
> 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
> 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
> after (3):
> 0x09 0x0e 0x01 0x01 0x01 0x01 0x00 0x00
> 0x09 0x08 0x0b 0x0a 0x0d 0x0c 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
>
> using LCMapStringW directly:
> before:
> 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
> 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
> after (7):
> 0x09 0x0e 0x01 0x01 0x01 0x01 0x07 0x00
> 0x09 0x08 0x0b 0x0a 0x0d 0x0c 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
>
> without LCMAP_BYTEREV:
> before:
> 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
> 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
> after (7):
> 0x0e 0x09 0x01 0x01 0x01 0x01 0x00 0x07
> 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
> 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
> 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f

Yes, that's the same.  Thanks for giving it a try--I should have
included example output in my original message.

You can see in the last case that without LCMAP_BYTEREV it writes the sequence

0x0e 0x09 0x01 0x01 0x01 0x01 0x00

with a terminating 0x00.  Bytes after that remain unchanged.  In the
other two examples *with* LCMAP_BYTEREV, the terminating 0x00 gets
swapped with the 0x07 after it, but this documented and expected
behavior of LCMapStringW, and is already accounted for in Cygwin's
wcsxfrm.  What is undocumented, and unexpected, is that it also byte
swaps 3 more byte pairs after the actual sort key, which can corrupt
memory unexpectedly.

Thanks,
Erik

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]