PHP Regular Expressions: UTS18 Compliant

Possible duplicate:
PHP Warning: mb_ereg_match (): mbregex compile err: premature end char -class

The Unicode Common Locale (CLDR) provides a wealth of information about the relationship between languages ​​and characters. For example, you can determine which characters are used in a particular language by looking at the misc.exemplarCharacters diagram . The source data for these diagrams is stored as XML files, and the sample characters are stored as regular expressions in accordance with the Unicode Regular Expressions UTS18 standard .

Here are some examples of what the UTS18 regex expressions look like:

1. [a à b c ç d e é è f g h i í ï j k l ŀ m n o ó ò p q r s t u ú ü v w x y z]
2. [অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ ং \u0981 ঃ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড {ড\u09BC}ড় ঢ {ঢ\u09BC}ঢ় ণ ত থ দ ধ ন প ফ ব ভ ম য {য\u09BC} ৰ ল ৱ শ ষ স হ া ি ী \u09C1 \u09C2 \u09C3 ে ৈ ো ৌ \u09CD]
3. [a á b ɓ c d ɗ e é ɛ {ɛ\u0301} f g i í j k l m n {ny} ŋ o ó ɔ {ɔ\u0301} p r s t u ú ū w y]

I use PHP and SimpleXML to parse XML data and isolate these regular expression strings. Now I would like to match individual multibyte characters with these regular expressions. I am currently using the mb_ereg_match function , which gives one or more of the following warnings (depending on the regular expression):

mbregex compile err: premature end of char-class in ...
mbregex compile err: empty range in char class in ...
mbregex compile err: empty char-class in ...

Any ideas as to why this is not working?

+5
source share
1 answer

As Sergey suggested, I added the following lines before calling the mb_ereg_match () function:

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

This add-on resolves the two warnings listed above. I was left with the following warning:

mbregex compile err: empty char-class in ...

, CLDR XML . , kn.xml :

<exemplarCharacters type="auxiliary">[]</exemplarCharacters>

, , , ( CLDR).

, , .

, - !

+2

All Articles