Java unicode regex does not match German characters

Question

Java unicode regex does not match German characters

This question is based on this question .

I use \P{M}\p{M}*to match all letters (in both German and French).

I chose this regex to avoid defining every Unicode character, such as: ^[a-zA-Z[\\u00c0-\\u01ff]]+[\\']?(([-]?[a-zA-Z[\\u00c0-\\u01ff]]*[\\s]?)|([\\s]?[a-zA-Z[\\u00c0-\\u01ff]]*[-]?)){1,2}[a-zA-Z[\\u00c0-\\u01ff]]+$

However, despite using the unicode format defined in the previous question, characters such as ßor èdo not match the regular expression.

I am using JDK 6.

What am I missing. Thank!

+3

java regex

Ionut Feb 07 '14 at 12:50

source share

2 answers

using java 6 this code

 public static void main(String[] args) {
       String str = "hello ß you";
       Pattern p = Pattern.compile("(:?\\P{M}\\p{M}*)+");
       Matcher matcher = p.matcher(str);
       System.out.println("replaced: '" + matcher.replaceAll("") + "'");
}

: : ''

"ß"

0

Antoine Wils 07 . '14 13:15

Bohemian · Accepted Answer · 2014-02-07T13:00:58+0000

Use the posix character class \p{L}for "any letter":

System.out.println("abcßè".matches("\\p{L}+")); // true

Java unicode regex does not match German characters

More articles: