ICU shell assumes "a" and "±" are the same

I use ICU with Lithuanian ( lt_LT) language. The alphabet for this language is as follows:a ą b c č d e ę ė <...> v z ž

However, when sorting, the ICU-collator assumes that, for example, aand ą( awith ogonek) are equivalent, therefore the list of Lithuanian words will be sorted as follows:

a, ą, ab, aba, abadas, <...>, b, ba, <...>`

When the expected result will be:

a, ab, aba, abadas, <...>, ą, <...>, b, ba, <...>

The same thing happens with other "accented" letters ( e- ę- ė, z- ž, etc.)

A more specific test case: running source/samples/coll/coll -locale lt_LT -source ą -target aadecides source is less than targetwhen it is not (see coll.cpp if you need to).

Is this behavior expected? Is this a bug or a function? If so, how can I prevent the ICU collaborator from matching similar letters?

+3
source share
1 answer

The letters are listed as a secondary difference in the portraits of CLDR, so they sort like this . If this is not the case, bring it to the CLDR , not the ICU problem. Mimer agrees.

+3
source

All Articles