How to detect and resolve incorrectly encoded Varchar data?

Question

How to detect and resolve incorrectly encoded Varchar data?

My company has a CRM product that is built on top of a third-party email system. We use their base database and have expanded it with our own databases. In addition to using our product, customers can directly enter the email system.

Email databases are encoded in SQL_Latin1_General_CP1_CI_AS, and contact names are stored in varchar columns, not nvarchar columns.

both our product and webmail product serve pages with Content-Type: text / html charset = utf-8

If a customer creates a contact in a webmail (third-party system) with the first name "Céline", he ends in the database as "CÃ line". This is because webmail first converts data from utf-8 to latin-1 before storing it in the database. Utf-8 char 'é' is stored as two bytes, which in Latin-1 are interpreted as two characters: "Ã ©"

However, when data is retrieved and displayed in webmail, it correctly displays as "Céline"

The problem is this: when reading / writing contacts from our CRM system, if you set the name to "Céline", it will be saved as "Céline", instead of converting Latin-1 to 'CÃ © line' first

on the contrary, if you create Céline in webmail, it displays in our CRM product as CÃ line, because it does not convert from Latin-1 to utf-8

Our product has been French internationalized and has been in production for several months, so the system has quite a lot of data with both encoding methods.

I can convert from latin-1 to utf-8 using:

var bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(Convert.ToString(obj))
string fix2 = Encoding.UTF8.GetString(bytes).Trim(); //from iso-8859-1 (latin-1) to utf-8

But this only works if the data has been correctly converted to Latin-1 before saving. So I really need a way to determine if the data in the record is a string encoded by utf-8, or encoded by latin-1.

Or, moving forward, I need a way to simulate what webmail does and do all write operations in the database first convert from utf-8 to latin-1, and all read operations convert from latin-1 to utf-8.

? , , /.

+3

.net sql-server encoding utf-8

Michael 02 '12 20:08

1

erikxiv · Accepted Answer · 2012-05-02T20:31:43+0000

. ( ) ( , -). UTF-8 , () .

?

. , ISO-8859-1, . , Ã , .

, -

# UTF-8 ISO-8859-1, ( )

Encoding.GetEncoding("iso-8859-1").GetString(Encoding.UTF8.getBytes("Some text"))

How to detect and resolve incorrectly encoded Varchar data?

More articles: