How to use Ruby regex to capture non-English words?

Question

I am trying to check the "words" with Ruby 1.8.7.

My regular expression to catch the word now:

/[a-zA-Z]\'*\-*/

It will only catch English words; Is there a way to catch non-English UTF-8 characters?

+3

ethicalhack3r Jun 05 '11 at 18:03

1 answer

Digitaloss · Accepted Answer · 2011-06-05T19:06:42+0000

Even the Regex 1.8.x engine supports UTF-8, you just need to use the correct expression, and this is a little more than just using /\w/:

s = "résumé and some other words"
puts s[/[a-z]+/u]
puts s[/\w+/u]

and you will receive:

r
résumé