This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

git diff --color-words doesn't work properly


When a .gitattributes file specifies a diff and the locale is utf8,
"git diff --color-words" fails with the message "fatal: Invalid
regular expression
[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*|[-+*/<>%&^|=!]=|--|\+\+|<<=?|>>=?|&&|\|\||::|->\*?|\.\*|[^[:space:]]|[<C0>-<FF>][<80>-<BF>]+".
This does not happen with Git for Windows.  To reproduce it, run the
following commands in an empty directory:

git init
echo "* diff=cpp" > .gitattributes
git add .gitattributes
# This works
LC_ALL=C git diff --staged --color-words
# This fails
LC_ALL=en_US.UTF-8 git diff --staged --color-words
# It also fails if the locale is set to any other utf8 locale (e.g.
en_GB.UTF-8, ja_JP.UTF-8, etc).

The issue appears to be in regcomp.c's wgetnext function, which calls
mbrtowc, which fails because the regex isn't valid utf-8.

The easy fix is probably to either remove the non-ASCII characters
from that regex (it's defined in git's userdiff.c) or change it to a
unicode codepoint range (i.e. U+0080-U+10FFFF), but I don't know if
that would break anything else.

The attached cygcheck.out has my email address redacted, but is
otherwise unmodified.

Attachment: cygcheck.out
Description: Binary data

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]