I know there are a lot of similar questions on SF, but I think my question is different enough to guarantee a new question. I have a table that has one column as utf8 with utf8_unicode_ci. It also has a unique key in this column along with another column denoting a language code. The data in the column are presented in many different scenarios (Latin with various accents, Chinese and Russian, among others).
The problem is that sometimes I want to enter two words with different meanings that differ only in diacritics (i.e. Spanish ano vs año). Since utf8_unicode_ci is both random and accent-insensitive, he thinks this is the same thing, and I will allow you to enter it. This sucks. Ideally, I would just switch the entire column to some sorting, in this case the sensitivity is not sensitive, but emphasized, but it does not seem to exist. This column uses a lot more, so I don’t want to change the default column setting for utf8_bin because of the fear of messy filling with case sensitivity.
So, all this suggests that I need some kind of solution that will not affect the default case sensitivity of many existing queries that fall into this column, but will allow me to add words that differ only in diacritics. Ideas? I will only switch the unique key restriction to utf8_bin if necessary, but I would prefer that I never want the two things in the table to differ only depending on the case.
source
share