How to define libpcre regex for arabic characters?

Question

How to define libpcre regex for arabic characters?

I need to define a PCRE regular expression for certain spam words in the Arabic / Persian alphabet that will be used in the spam module in drupal . The problem is that the regular regular expression PCRE, apparently, cannot find patterns in the Arabic alphabet.

For example, while / bad word / flags are instances of a "bad word", but

/کلمه بد/i

Cannot tag 'کلمه بد'.

+3

php regex utf-8 drupal-6 arabic

qliq May 10, '11 at 16:46

source share

2 answers

Perl , use utf8; .

/\x{644}/,

open my $fh, '<:utf8', 'somefile.txt' or die "blah blah";
my $bad_thing = <$fh>;
/$bad_thing/;

utf8, , /ل/, use utf8. ?

+2

hobbs 10 '11 18:03

Alix Axel · Accepted Answer · 2011-05-12T17:48:18+0000

I have no problem with this if I use the PCRE modifier u(Unicode):

$string = 'کلمه بد';

if (preg_match('~\p{Arabic}~u', $string) > 0)
{
    var_dump('contains Arabic characters');

    if (preg_match('~کلمه بد~ui', $string) > 0)
    {
        var_dump('contains spam-ish Arabic characters');
    }
}

string(26) "contains Arabic characters"
string(35) "contains spam-ish Arabic characters"

IDEOne.com. ( ) () UTF-8.

How to define libpcre regex for arabic characters?

More articles: