I use regex snippets to parse emoticons in images and am facing semicolon problems. For example, a smiley like;) turns into a WINK icon associated with
/;-?\)/g
and works in most cases. But text like ") also matches" WINK ", because the quote is actually an html object ( " => "WINK).
I tried a regex prefix with a greedy non-capturing match to discard semicolons in entities:
(?:"|&|<|>|'|')?
But the resulting pattern still matches the semicolon in ", which should be skipped because it goes back to satisfy the optional last part. I also understand that there will still be problems with other legitimate coincidences, such as EVIL: >:) => >:).
So it seems to me that I really need to negate previous html objects that don't have a semicolon:
(?!"|&|<|>|&apos|&
But it still fits, and I'm not sure why.
It would be ideal to continue to return matches that can be replaced in bulk without additional verification, but I am open to suggestions. What doesn’t work, disassemble the html objects first, because sometimes they are necessary and / or part of a legitimate emoticon (as with EVIL).
EDIT (some Google products):
( ), lookbehind , (?<!regex), ( (?!regex)).
regular-expressions.info, " , lookahead ", , , .
, " , , ", , . lookbehind, .
, :
/(?<!"|&|<|>|&apos|&
: ;) => WINK blah;) => blahWINK ";) => "WINK : ")
, &quot;) => &quotWINK, (, , ). , html- , .
" ", , javascript lookbehind. .