PHP Character Encoding Problem Curly Quote

I know that there is an old problem with character encoding between different character sets, but I am stuck in one related to “curly quote” window.

We have a client who likes to copy and paste data into a text field, and then publish it in our application. Curly quotes will often be in this data. I used the following conversion of them to my regular copies:

function convert_smart_quotes($string)  { 

$badwordchars=array("\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x93", "\xe2\x80\x94", "\xe2\x80\xa6");

$fixedwordchars=array("'", "'", '"', '"', '-', '--', '...');

return str_replace($badwordchars,$fixedwordchars,$string); 

}

This worked perfectly for several months. Then after some changes (we switch servers, do updates on the system, update PHP, etc. Etc.), we learned that it no longer works. So, I look, and I learn that the "curly quotes" all change to different characters. In this case, they turn into the following:

"= ¡È

"= ¡É

'= ¡Æ

= ¡Ç

" - " . mySQL latin1_swedish_ci, , . , , utf-8 , latin1_swedish_ci ISO-8859-1, ... .

- , utf-8. ISO-8859-1, .

"¡È" "¡É" , . , :

$string = str_replace("xa1\xc8", '"', $string);
$string = str_replace("xa1\xc9", '"', $string);
$string = str_replace("xa1\xc6", "'", $string);
$string = str_replace("xa1\xc7", "'", $string);

. , googleing "¡É" .

!

+3
2

, UTF-8 , , Latin1 (ISO-8859-1). ( , latin1_swedish_ci , ( Latin1). . - . , .)

, UTF-8 , UTF-8 Latin1, iconv.

, : UTF-8, , Latin1. ( ? - ), ( , " ... , - ?

iconv , :

// convert from utf8 to latin1, approximating out of range characters
// by the closest latin1 alternative where possible (//TRANSLIT)
$latinString = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $utf8String);

( , - . iconv documentation .)

, PHP utf_decode:

$latinString = utf_decode($utf8String);

PHP , , ( ).

, Unicode ( !).

+5

.

$str = mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8');

$str = mb_convert_encoding($str, 'HTML-ENTITIES', 'auto');

- php.

+2

All Articles