Why is the 7-bit ASCII string literal encoded as UTF-8 in Ruby

I am reading the Ruby Programming Language. In section 3.2.6.1, "Multibyte Characters in Ruby 1.9," the book presents optimization in the Ruby string.

If a string literal contains only 7-bit ASCII characters, then its encoding method will return ASCII, even if the source encoding is UTF-8

I tried the following simple script on Ruby 1.9.1-p431, 1.9.2 and 1.9.3-p125, both use UTF-8 encoding for 7-bit ASCII characters.

# coding: utf-8
s = 'hello'
p s.encoding
# result is #<Encoding:UTF-8>

This change may have changed during the development of Ruby 1.9. I tried to find the Ruby 1.9 change log, and 1.9.1 changelog confirms this behavior. I also cloned the Ruby git repository, but I cannot find a mention of a change in this behavior.

Update:

Ruby, , Ruby 1.9.0, 2008 . ( Debian 6, .) " Ruby" - , 2008 . , .

Encoding.list. , .

+3
2

, Pdf Ruby ()

, ,

, "dog" utf-8. , . ​​ , ,

+4

, "" Ruby "" , . , UTF-8, , UTF-8, , , UTF-8 7- ASCII , .

Ruby , . , .

force_encoding. , , encode.

, :

'dog'.encoding
# => #<Encoding:UTF-8> 
'dog'.bytes.to_a
# => [100, 111, 103] 
'dog'.chars.to_a
# => ["d", "o", "g"]

7- ASCII:

'døg'.encoding
# => #<Encoding:UTF-8> 
'døg'.bytes.to_a
# => [100, 195, 184, 103]
'døg'.chars.to_a
# => ["d", "ø", "g"]
+2

All Articles