How to determine if a character set in JavaScript is UTF-8 or not?

Question

How to determine if a character set in JavaScript is UTF-8 or not?

This is a kind of version of the previously asked questions, but I still can’t find the answer, so I'm trying to translate it into the essence of the problem in the hope that there is a solution.

I have a database in which, for historical reasons, certain text entries are not UTF-8. Most of them. And all the records took the last 3 years. But some old entries are not.

It's important to find non-UTF-8 characters so that I can either escape them or convert them to UTF-8 for some XML I'm trying to create.

I use JavaScript on the server side, it is of type ByteBuffer, so I can consider any character set as separate bytes and check them as necessary, and I do not need to use the String type, which, as I understand it, is problematic in this situation.

Is there any text check I can do to determine if this is UTF-8 or not in this case?

I searched for a couple of months (; _;) and still could not find the answer. However, there must be a way to do this, as XML validators (for example, in major browsers) may report "coding errors" when they encounter characters other than UTF-8.

I just would like to know any algorithm how to do this, so that I can try to do the same test in JavaScript. Once I know which characters are bad, I can convert them from ISO-8859-1 (for example) to UTF-8. I have methods for this.

I just don't know how to determine which characters are not UTF-8. Again, I understand that using a script like JavaScript is problematic in this situation, but I have an alternative ByteBuffer type that can handle characters in byte-based.

Thanks for any specific tests people can offer.

Arc

+3

javascript utf-8 character-encoding

Doug lerner Feb 17 '14 at 0:45

source share

1 answer

Anders · Answer 1 · 2016-10-06T18:40:02+0000

. 16 , JSON , : UTF-8, ANSI (ASCII), UCS2_BE, UCS2_LE. UTF16, , 16- JavaScript, , SQL- AWS. JavaScript- UTF-8, 16- JavaScript, ï "¿ , 8- JavaScript, , 3 ï "¿

, , . , , , 2 , 715 .

:

var bolResult = isEncoded(strJSON);

/**
 * @description Check if string is UTF8 encoded
 * @param {string} JSON
 * @returns {boolean} true/false
 */
function isEncoded(strJSON) {
        /***************************
         * Valid string starts with:
         * ï»¿{
         * 239, 187, 191
         ********************/
        var intCharCode0 = strJSON.charCodeAt(0);   //239
        var intCharCode1 = strJSON.charCodeAt(1);   //187
        var intCharCode2 = strJSON.charCodeAt(2);   //191

        if(intCharCode0 === 239 && intCharCode1 === 187 && intCharCode2 === 191){
            return true;
        }
        else{
            return false;
        }
}

How to determine if a character set in JavaScript is UTF-8 or not?

More articles: