What is the correct way to encode un utf-8 characters encodingURIcomponent and decode them accordingly?

I have a Javascript bookmarklet that uses encodeURIcomponentto transfer the URL of the current page to the server, and then use it urldecodeon the server side to return the characters.

The problem is that the encoded character is not in utf-8 (for my case it is gb2312, but it may be something else), and when the server executes urldecode, the decoded character becomes a square. This is obviously not what looked like an encoding.

This is a bookmarklet, the input can be anything, so I can’t just define “encode as gb2312” in js or “decode as gb2312” in php scripts.

So, is there a proper use way encodeURIcomponentthat conveys a character encoding along with the content, and then decoding can choose the right encoding to decode it?

+3
source share
2 answers

To encode browsers, especially for GB2312 encoding, first check the following documents (in Chinese)

In your case, it is %C8%B7%B6%A8actually generated from form GB2312 '\u786e\u5b9a'. This usually happens on (legacy?) Versions of IE and FF, when the user directly enters the Chinese character in the location bar,
, IRI URI, , '/tag/\xc8\xb7\xb6\xa8' (douban.com , URI UTF8). , Chrome, , FF IE, douban - .

, encodeURIComponent

> encodeURIComponent('%C8%B7%B6%A8')
  "%25C8%25B7%25B6%25A8"

, , ascii, , '%C8%B7%B6%A8'.

, , encodeURIComponent , %XX, XX , 0x7F. , RFC 2396.

写 英文 好累 啊, 不过 还是 要 入乡随俗 ~

0

escape(), .

MDN escape():

, 0xFF less, escape-:% xx. , % uxxxx.

, escape() replace():

escape(input_value).replace(/%u([0-9a-fA-F]{4})/g, '&#x$1;');

, , :

escape(input_value).replace(/%u([0-9a-fA-F]{4})/g, function(m0, m1) {
                return '&#' + parseInt(m1, 16) + ';';
};

PHP

client.html ( : GB2312):

<html>
  <head>
    <meta charset="gb2312">
    <script>
    function processForm(form) {
        console.log('BEFORE:', form.test.value);
        form.test.value = escape(form.test.value).replace(/%u(\w{4})/g, function(m0, m1) {
            return '&#' + parseInt(m1, 16) + ';';
        });
        console.log('AFTER:', form.test.value);
        return true;
    }
    </script>
  </head>
  <body>
    <form method="post" action="server.php" onsubmit="return processForm(this);">
      <input type="text" name="test" value="确定">
      <input type="submit">
    </form>
  </body>
</html>

server.php:

<?php
echo '<script>console.log("', 
     $_REQUEST['test'], ' --> ', 
     mb_decode_numericentity($_REQUEST['test'], array(0x80, 0xffff, 0, 0xffff), 'UTF-8'),
     '");</script>';
?>
0

All Articles