How to handle SQL state [HY000]; error code [1366]; Invalid string value?

I know that this error means that the mysql column is not accepting a value, but this is strange because the value fits into the Java encoded string UTF-8 and the mysql column is utf8_general_ci. In addition, all utf8 characters worked properly except for them.

Use case: I import tweets. This tweet: https://twitter.com/bakervin/status/210054214951518212 - you see two "strange" characters (and two strange spaces between them). The question is how to handle this:

  • truncate these characters (how - what are they, how is Java UTF-8 different from MySQL one)
  • make the column able to accept this value (like - is there anything more utf-y than utf8_general_ci)
+5
source share
1 answer

Surrogate characters seem to be unicode . Since they are not actual characters, and MySQL seems to not support them, it is safe to crop them:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
    char ch = text.charAt(i);
    if (!Character.isHighSurrogate(ch) && !Character.isLowSurrogate(ch)) {
        sb.append(ch);
    }
}
return sb.toString();
+9
source

All Articles