How to define libpcre regex for arabic characters?

I need to define a PCRE regular expression for certain spam words in the Arabic / Persian alphabet that will be used in the spam module in drupal . The problem is that the regular regular expression PCRE, apparently, cannot find patterns in the Arabic alphabet.

For example, while / bad word / flags are instances of a "bad word", but

/کلمه بد/i

Cannot tag 'کلمه بد'.

+3
source share
2 answers

I have no problem with this if I use the PCRE modifier u(Unicode):

$string = 'کلمه بد';

if (preg_match('~\p{Arabic}~u', $string) > 0)
{
    var_dump('contains Arabic characters');

    if (preg_match('~کلمه بد~ui', $string) > 0)
    {
        var_dump('contains spam-ish Arabic characters');
    }
}

string(26) "contains Arabic characters"
string(35) "contains spam-ish Arabic characters"

IDEOne.com. ( ) () UTF-8.

+2

Perl , use utf8; .

/\x{644}/,

open my $fh, '<:utf8', 'somefile.txt' or die "blah blah";
my $bad_thing = <$fh>;
/$bad_thing/;

utf8, , /ل/, use utf8. ?

+2

All Articles