Regular expression to match content to multi-character string

I have a defective input that looks like this:

foo<p>bar</p>

And I want to normalize it to wrap the leading text in the p tag:

<p>foo</p><p>bar</p>

It is quite simple with replacing the regular expression /^([^<]+)/with <p>$1</p>. The problem is that sometimes the leading piece contains tags other than p, for example:

foo <b>bold</b><p>bar</p>

This should wrap the entire fragment in the new p:

<p>foo <b>bold</b></p><p>bar</p>

But since a simple regular expression looks only for <, it pauses on <b>and spits out:

<p>foo </p><b>bold</b><p>bar</p> <!-- oops -->

So how do I rewrite regex to match <p? Apparently, the answer includes a negative look, but this one is too deep for me.

( " HTML !" HTML, <p>, <a>, <b> <i>, a/b/i .)

+3
1

, . :

/^([^<]+)(?=<p)/

, , <, p, <p, lookahead.

:

> var re = /^([^<]+)(?=<p)/g;

> 'foo<p>bar</p>'.replace(re, '<p>$1</p>');
  "<p>foo</p><p>bar</p>"

> 'foo <b>bold</b><p>bar</p>'.replace(re, '<p>$1</p>')
  "foo <b>bold</b><p>bar</p>"

, : , "foo bold" p, .

p ( foo), <p>foo</p>.

- 2 , /^(.+?(?=<p))/ /^([^<]+)/.

> var re1 = /^(.+?(?=<p))/g,
      re2 = /^([^<]+)/g,
      s = '<p>$1</p>';

> 'foo<p>bar</p>'.replace(re1, s).replace(re2, s);
  "<p>foo</p><p>bar</p>"

> 'foo'.replace(re1, s).replace(re2, s);
  "<p>foo</p>"

> 'foo <b>bold</b><p>bar</p>'.replace(re1, s).replace(re2, s);
  "<p>foo <b>bold</b></p><p>bar</p>"

, re1 re2:
/^(.+?(?=<p)|[^<]+)/

> var re3 = /^(.+?(?=<p)|[^<]+)/g,
      s = '<p>$1</p>';

> 'foo<p>bar</p>'.replace(re3, s)
  "<p>foo</p><p>bar</p>"

> 'foo'.replace(re3, s)
  "<p>foo</p>"

> 'foo <b>bold</b><p>bar</p>'.replace(re3, s)
  "<p>foo <b>bold</b></p><p>bar</p>"
+3

All Articles