Regex matches all alphanumeric hashtags, no characters

I am writing a hashtag scraper for facebook, and every regular expression that I come across to get hashtags seems to include punctuation marks as well as alphanumeric characters. Here is an example of what I would like:

Hello World! I am a # m4king scraper #fac_book and would like a nice regular #expression.

I would like to match world, m4king, facand expression(note that I would like it to be turned off if it has reached the punctuation, including spaces). It would be nice if it did not include a hash symbol, but this is not very important.

Just make it important, I will use the ruby ​​string scan method to capture more than one tag.

Thanks for the heaps in advance!

+5
source share
3 answers

A regular expression like this: #([A-Za-z0-9]+)should match what you need and put it in a capture group. You can then access this group later. Perhaps this will help shed light on regular expressions (from a Ruby context).

The regular expression above will begin to match when it finds a tag #, and will throw any subsequent letters or numbers into the capture group. When he finds something that is not a letter or a number, it will stop matching. As a result, you will get a group containing what you need.

+5
source
str = 'Hello #world! I am #m4king a #fac_book scraper and would like a nice regular #expression'
str.scan(/#([A-Za-z0-9]+)/).flatten #=> ["world", "m4king", "fac", "expression"]

The #flatten call is necessary because each capture group will be inside its own array.

, - "#":

str.scan /(?<=#)[[:alnum:]]+/ #=> ["world", "m4king", "fac", "expression"]
+6

Here's a simpler regex #[[:alnum:]_]/. Please note that it includes underscores, as Facebook currently includes underscores as part of hashtags (like twitter).

str = 'Hello #world! I am #m4king a #fac_book scraper and would like a nice regular #expression'
str.scan(/#[[:alnum:]_]+/)

Here's a view of Rubular: http://rubular.com/r/XPPqwtVGN9

+2
source

All Articles