Recursively replace matching regular expression tags

I have the following line:

<?foo?> <?bar?> <?baz?> hello world <?/?> <?/?> <?/?>

I need a regex to convert it to

<?foo?> <?bar?> <?baz?> hello world <?/baz?> <?/bar?> <?/foo?>

The following code works for non-recursive tags:

$x=preg_replace_callback('/.*?<\?\/\?>/',function($x){
    return preg_replace('/(.*<\?([^\/][\w]+)\?>)(.*?)(<\?\/?\?>)/s',
          '\1\3<?/\2?>',$x[0]);
},$str);
+3
source share
2 answers

You cannot do this with regular expressions. You need to write a parser!

So, create a stack (an array in which you add and remove elements from the end. Use array_push() array_pop()).

Iterate through tags, pushing known opening tags on the stack.

When you go to the closing tag, put it on the stack and it will tell you the tag that you need to close.

+1
source

For a recursive structure, make a recursive function. In some kind of pseudo code:

tags = ['<?foo?>', '<?bar?>', '<?baz?>']

// output consumed stream to 'output' and return the rest
function close_matching(line, output) {
  for (tag in tags) {
    if line.startswith(tag) {
      output.append(tag)
      line = close_matching(line.substring(tag.length()), output)
      i = line.indexof('<')
      ... // check i for not found
      output.append(line.substring(0, i))
      j = line.indexof('>')
      ... // check j for error, and check what between i,j is valid for close tag
      output.append(closetag_for_tag(tag))
      line = line.substring(j + 1)
    }
  }
  return line;
}

, .

0

All Articles