How to specify a valid character property using Oniguruma regular expressions?

Question

How to specify a valid character property using Oniguruma regular expressions?

I use oniguruma gem to get regular expressions with unicode support in ruby 1.8. According to the syntax documentation, I have to use \p{M}or \p{Mark}to match code points with the Mark property.

However, when I do the following

ORegexp.new '\p{M}',
            :options => OPTION_MULTILINE | OPTION_SINGLELINE | OPTION_IGNORECASE | OPTION_EXTEND,
            :syntax => SYNTAX_JAVA, # so we can use character properties
            :encoding => ENCODING_UTF8

I get it ArgumentError: Oniguruma Error: invalid character property name {M}. I get the same error if I use {Mark}, or if I use one of the other supporting syntaxes \p.

What am I doing wrong? How to specify a valid character property using Oniguruma regular expressions?

UPDATE If I use one of the UTF16 encodings, regular expression compilation; but since my lines are in UTF8, this does not help. So my question is: how do I specify a valid character property using UTF-8 Oniguruma regular expressions?

+3

ruby regex oniguruma

Simon Mar 21 '11 at 10:26

source share

1 answer

zmanc · Answer 1 · 2012-10-18T19:15:38+0000

Try using

/\p{Mark}

I read on an old Ruby blog that using a slash “will try to find the value as an encoding in a string”

http://www.ruby-forum.com/topic/154384

How to specify a valid character property using Oniguruma regular expressions?

More articles: