How to accent and case-insensitive search in the MediaWiki database?

Suppose I have these page names on my wiki (MediaWiki 1.19.4):

SOMETHIng
Sómethìng
SomêthÏng
SÒmetHínG

If the user is searching something, I want all 4 pages to be returned as a result.

At the moment, I can only think of this query (MySQL Percona 5.5.30-30.2):

SELECT page_title
FROM page
WHERE page_title LIKE '%something%' COLLATE utf8_general_ci

Which returns something.

I have to be on the right track, because if I search for sóméthíngOR SÓMÉTHÍNG, I get the somethingresult. How can I modify the query to get other results as expected? Performance is not critical here, as the table pagecontains only ~ 2K rows.

This is a table definition with the corresponding bits:

CREATE TABLE page (
    (...)
    page_title VARCHAR(255) NOT NULL DEFAULT '' COLLATE latin1_bin,
    (...)
    UNIQUE INDEX name_title (page_namespace, page_title),
)

, MediaWiki AFAIK, , (.. unicode ).

+5
3

, . ( ), , , 2K, .

, MediaWiki UTF8 latin1. MediaWiki, , , , MySQL . , UTF8 MySQL (. MediaWiki DefaultSettings.php, variable $wgDBmysql5).

, , UTF8 (, ). , , MySQL, UTF8 ( ).

: UTF8 , , - CONVERT(col_name USING utf8). , MySQL : , col_name , ( ) UTF8, end UTF8, , , .

, MySQL ? BINARY, UTF8!. , MySQL : UTF8. CONVERT(CAST(col_name AS BINARY) USING utf8).

, :

SELECT CONVERT(CAST(page_title AS BINARY) USING utf8)
FROM page
WHERE
    CONVERT(CAST(page_title AS BINARY) USING utf8)
        LIKE '%keyword_here%'
            COLLATE utf8_spanish_ci

, something sôMëthîNG , !

, utf8_spanish_ci, , ñ n, á a. ( ).

:

+3

MediaWiki TitleKey , . , , PHP iconv extension, TitleKey_body.php :

static function normalize( $text ) {
    global $wgContLang;
    return $wgContLang->caseFold( $text );
}

:

static function normalize( $text ) {
    return strtoupper( iconv( 'UTF-8', 'US-ASCII//TRANSLIT', $text ) );
}

(re) rebuildTitleKeys.php.

TitleKey , titlekey. MediaWiki, , , . :

SELECT page.* FROM page
  JOIN titlekey ON tk_page = page_id
WHERE tk_namespace = 0 AND tk_key = 'SOMETHING';
+3

: ( _ci)

: , , , . ( , SomêthÏng), search_row, something ( - ). .

.

Last step: you create a trigger that fills / updates the search_row field every time you insert / update the title in the page table .

This decision will not have any negative impact on performance!

+1
source

All Articles