Match Thai Script Character in Java

Over the past two hours, I have had a lot of sexual time with Thai Script lines that slipped through my database. They mysteriously lurk, mutate at the exit, do not have a natural order and are a disaster.

I want to just ignore any lines with Thai Script characters, but I have no idea how:

Pattern.compile("\\p{Thai}")does not work during initialization. "[ก-๛]"- will it ever work? What is the right way?

+3
source share
2 answers

Thaiis a Unicode block, and Unicode blocks must be specified as \p{In...}:

Pattern.compile("\\p{InThai}") 
+6
source

Unicode, Unicode. , ฿, U + 0E3F THAI CURRENCY SYMBOL BAHT , \p{Block=Thai} ᴀᴋᴀ \p{InThai}, \p{Script=Thai} ᴀᴋᴀ \p{IsThai}. \p{Script=Common}.

, . 18 , script, script 250 , .

, , Unicode 6.0 U + 0E3F outlier . , Java Unicode Java 7; , . Unicode script , JDK7, JNI ICU, Google Java Android. , , , JNI, , .

+5

All Articles