When to use Unicode Normalization Forms NFC and NFD?

The Unicode standardization FAQ contains the following paragraph:

Programs should always compare canonical equivalent Unicode strings as equal ... The Unicode standard provides well-defined normalization forms that can be used for this: NFC and NFD.

and continues ...

The choice to use depends on the particular program or system. NFC is the best form for general text because it is more compatible with strings converted from legacy encodings .... NFD and NFKD are most useful for internal processing.

My questions:

What makes NFC the best for general text. What defines "internal processing" and why is it best to leave NFD? And finally, ignoring what is “best,” are two forms interchangeable if two lines are compared using the same normalization form?

+5
source share
2 answers

, "", "" . Unicode ( FAQ) . , , , , .

, . , .

, U + 0387 GREEK ANO TELEIA (·) U + 00B7 MIDDLE DOT (·). , , - -. , . , NFC , .

, , . , "ä" Unicode U + 00E4 LATIN SMALL LETTER A WITH DIAERESIS Unicode U + 0061 LATIN SMALL LETTER A U + 0308 COMBINING DIAERESIS. , .. , , , "ä", , . , , , .. , .

, - . , - , .

+6
  • NFC - , , ä - 1 , .

  • NFD - , NFD . - . , , .

  • x y , toNFC (x) = toNFC (y)
    toNFD (x) = toNFD (y)

    , ?

+1

All Articles