Php preg_split utf8 characters

You have a problem with preg split and utf. This is the code:

$original['words'] = preg_split("/[\s]+/", $original['text']);
print_r($original);

This is the answer:

Array
(

    [text] => Šios baterijos kaista
    [words] => Array
        (
            [0] =>  
            [1] => ios
            [2] => baterijos
            [3] => kaista

This code runs as part of CakePHP. Note that [text] appears correctly before words and is spoiled in schism. By the way, I tried using them:

mb_internal_encoding( 'UTF-8'); 
mb_regex_encoding( 'UTF-8');  
ini_set('default_charset','utf-8');

Nothing helped. Thank.

+5
source share
2 answers

You need to enable utf-8 mode by preg_splitadding u modifier to the regular expression :

preg_split("/[\s]+/u", $original['text']);

The configuration directives that you mention as part of the search for a solution do not play any role here.

+11
source
$original = mb_split("[\s]+", 'Šios baterijos kaista');
print_r($original);

Result:

Array
(
    [0] => Šios
    [1] => baterijos
    [2] => kaista
)

Remarks:

1) Remember to remove the leading and trailing '/' from the regular expression pattern when using mb_split .

2) Works only if mbstring extension is enabled .

0
source

All Articles