Java unicode regex does not match German characters

This question is based on this question .

I use \P{M}\p{M}*to match all letters (in both German and French).

I chose this regex to avoid defining every Unicode character, such as: ^[a-zA-Z[\\u00c0-\\u01ff]]+[\\']?(([-]?[a-zA-Z[\\u00c0-\\u01ff]]*[\\s]?)|([\\s]?[a-zA-Z[\\u00c0-\\u01ff]]*[-]?)){1,2}[a-zA-Z[\\u00c0-\\u01ff]]+$

However, despite using the unicode format defined in the previous question, characters such as รŸor รจdo not match the regular expression.

I am using JDK 6.

What am I missing. Thank!

+3
source share
2 answers

Use the posix character class \p{L}for "any letter":

System.out.println("abcรŸรจ".matches("\\p{L}+")); // true
+3
source

using java 6 this code

 public static void main(String[] args) {
       String str = "hello รŸ you";
       Pattern p = Pattern.compile("(:?\\P{M}\\p{M}*)+");
       Matcher matcher = p.matcher(str);
       System.out.println("replaced: '" + matcher.replaceAll("") + "'");
}

: : ''

"รŸ"

0

All Articles