I use oniguruma gem to get regular expressions with unicode support in ruby 1.8. According to the syntax documentation, I have to use \p{M}or \p{Mark}to match code points with the Mark property.
However, when I do the following
ORegexp.new '\p{M}',
:options => OPTION_MULTILINE | OPTION_SINGLELINE | OPTION_IGNORECASE | OPTION_EXTEND,
:syntax => SYNTAX_JAVA,
:encoding => ENCODING_UTF8
I get it ArgumentError: Oniguruma Error: invalid character property name {M}. I get the same error if I use {Mark}, or if I use one of the other supporting syntaxes \p.
What am I doing wrong? How to specify a valid character property using Oniguruma regular expressions?
UPDATE If I use one of the UTF16 encodings, regular expression compilation; but since my lines are in UTF8, this does not help. So my question is: how do I specify a valid character property using UTF-8 Oniguruma regular expressions?
Simon source
share