Skip previous html objects in javascript regex

I use regex snippets to parse emoticons in images and am facing semicolon problems. For example, a smiley like;) turns into a WINK icon associated with

/;-?\)/g

and works in most cases. But text like ") also matches" WINK ", because the quote is actually an html object ( " => "WINK).

I tried a regex prefix with a greedy non-capturing match to discard semicolons in entities:

(?:"|&|<|>|'|')?

But the resulting pattern still matches the semicolon in ", which should be skipped because it goes back to satisfy the optional last part. I also understand that there will still be problems with other legitimate coincidences, such as EVIL: >:) => >:).

So it seems to me that I really need to negate previous html objects that don't have a semicolon:

(?!&quot|&amp|&lt|&gt|&apos|&#039)

But it still fits, and I'm not sure why.

It would be ideal to continue to return matches that can be replaced in bulk without additional verification, but I am open to suggestions. What doesn’t work, disassemble the html objects first, because sometimes they are necessary and / or part of a legitimate emoticon (as with EVIL).


EDIT (some Google products):

( ), lookbehind , (?<!regex), ( (?!regex)).

regular-expressions.info, " , lookahead ", , , .

, " , , ", , . lookbehind, .

, :

/(?<!&quot|&amp|&lt|&gt|&apos|&#039);-?\)/g

: ;) => WINK blah;) => blahWINK &quot;;) => &quot;WINK : &quot;) , &amp;quot;) => &amp;quotWINK, (, , ). , html- , .

" ", , javascript lookbehind. .

+3
1

-, , ;-), . lookbehind JavaScript - , JS, - :

var text = "& &amp; &amp;-) ;-) test;-)";
var ENTITIES_REGEX = /(&quot|&amp|&lt|&gt|&apos|&#039)?;-\)/g;

var result = text.replace(ENTITIES_REGEX, function(fullMatch, backref1) {
  // Ignore if there is a backreference by returning the unaltered
  // match, otherwise return WINK
  return (backref1 ? fullMatch : 'WINK');
});

// result equals "& &amp; &amp;-) WINK testWINK"

.

+2

All Articles