Why ruby ​​does not detect invalid encoding while mysql is running?

I am pulling out some RSS feeds from YouTube that have invalid UTF8. I can create a similar ruby ​​string using

bad_utf8 = "\u{61B36}"
bad_utf8.encoding # => #<Encoding:UTF-8>
bad_utf8.valid_encoding? # => true

Ruby considers this to be valid UTF-8 encoding, and I'm sure it is not.

When you talk to Mysql, I get this error

require 'mysql2'
client = Mysql2::Client.new(:host => "localhost", :username => "root")
client.query("use test");

bad_utf8 = "\u{61B36}"
client.query("INSERT INTO utf8 VALUES ('#{moo}')")

# Incorrect string value: '\xF1\xA1\xAC\xB6' for column 'string' at row 1 (Mysql2::Error)

How can I detect or fix these invalid encoding types before sending them to MySQL?

+3
source share
2 answers

perhaps because the code point does not lie in the basic multilingual plan which is the only character that MySQL allows in the "utf8" character set.

mysql , "utf8mb4", Unicode BMP.

, , . . ( ) BMP.

+1

Ruby String.valid_encoding?, :

irb
1.9.3-p125 :001 > bad_utf8 = "\u{0}"
 => "\u0000" 
1.9.3-p125 :002 > bad_utf8.valid_encoding?
 => true 
1.9.3-p125 :003 > bad_utf8.encoding
 => #<Encoding:UTF-8>

UTF-8 (: https://en.wikipedia.org/wiki/Utf8), , NULL (, , html-).

" UTF-8", : bmp_only (0x1-0xffff). (: https://en.wikipedia.org/wiki/Unicode_plane).

: https://gist.github.com/2295531

+2

All Articles