Condition inside regex pattern

Question

Condition inside regex pattern

I would like to remove any extra spaces from my code, I am parsing docblock. The problem is that I do not want to remove spaces inside <code>code goes here</code>.

For example, I use this to remove extra spaces:

$string = preg_replace('/[ ]{2,}/', '', $string);

But I would like to keep the gaps inside <code></code>

This code / line:

This  is some  text
  This is also   some text

<code>
User::setup(array(
    'key1' => 'value1',
    'key2' => 'value1'
));
</code>

Must be converted to:

This is some text
This is also some text

<code>
User::setup(array(
    'key1' => 'value1',
    'key2' => 'value1'
));
</code>

How can i do this?

+3

php regex

sandelius Mar 12 '11 at 15:02

source share

4 answers

PHP preg_replace_callback(), (?R), (?1), (?2)..., . script :

<?php // test.php 20110312_2200

function clean_non_code(&$text) {
    $re = '%
    # Match and capture either CODE into $1 or non-CODE into $2.
      (                      # $1: CODE section (never empty).
        <code[^>]*>          # CODE opening tag
        (?R)+                # CODE contents w/nested CODE tags.
        </code\s*>           # CODE closing tag
      )                      # End $1: CODE section.
    |                        # Or...
      (                      # $2: Non-CODE section (may be empty).
        [^<]*+               # Zero or more non-< {normal*}
        (?:                  # Begin {(special normal*)*}
          (?!</?code\b)      # If not a code open or close tag,
          <                  # match non-code < {special}
          [^<]*+             # More {normal*}
        )*+                  # End {(special normal*)*}
      )                      # End $2: Non-CODE section
    %ix';

    $text = preg_replace_callback($re, '_my_callback', $text);
    if ($text === null) exit('PREG Error!\nTarget string too big.');
    return $text;
}

// The callback function is called once for each
// match found and is passed one parameter: $matches.
function _my_callback($matches)
{ // Either $1 or $2 matched, but never both.
    if ($matches[1]) {
        return $matches[1];
    }
    // Collapse multiple tabs and spaces into a single space.
    $matches[2] = preg_replace('/[ \t][ \t]++/S', ' ', $matches[2]);
    // Trim each line
    $matches[2] = preg_replace('/^ /m', '', $matches[2]);
    $matches[2] = preg_replace('/ $/m', '', $matches[2]);
    return $matches[2];
}

// Create some test data.
$data = "This  is some  text
  This is also   some text

<code>
User::setup(array(
    'key1'      => 'value1',
    'key2'      => 'value1',
    'key42'     => '<code>
        Pay no attention to this. It has been proven over and
        over again that it is <code>   unpossible   </code>
        to parse nested stuff with regex!           </code>'
));
</code>";

// Demonstrate that it works on one small test string.
echo("BEFORE:\n". $data ."\n\n");
echo("AFTER:\n". clean_non_code($data) ."\n\nTesting...");

// Build a large test string.
$bigdata = '';
for ($i =   0; $i < 30000; ++$i) $bigdata .= $data;
$size = strlen($bigdata);

// Measure how long it takes to process it.
$time = microtime(true);
$bigdata = clean_non_code($bigdata);
$time = microtime(true) - $time;

// Print benchmark results
printf("Done.\nTest string size: %d bytes. Time: %.3f sec. Speed: %.0f KB/s.\n",
    $size, $time, ($size / $time)/1024.);
?>

script : WinXP32 PHP 5.2.14 (cli)

'Test string size: 10410000 bytes. Time: 1.219 sec. Speed: 8337 KB/s.'

, CODE, <> (, ), . , ( , CODE .)

p.s. SO.  .

+2

ridgerunner 13 . '11 6:50

, , - - HTML.

, , code DOMDocument .

fopen(), , , code.

, code, <code> , code. . Reset , </code>. , , code ( )?

Mario .

+1

alex 12 . '11 16:35

HTML - .

RegEx, XHTML,

- Zend_DOM HTML .

0

Vladislav Rastrusny 12 . '11 15:18

Kobi · Accepted Answer · 2011-03-13T07:55:15+0000

You really are not looking for a condition - you need to skip parts of the string so that they are not replaced. This can be done quite easily using preg_replace, inserting dummy groups and replacing each group with itself. In your case, you only need one:

$str = preg_replace("~(<code>.*?</code>)|^ +| +$|( ) +~smi" , "$1$2", $str);

How it works?

(<code>.*?</code>) - <code> , $1. , .
^ + - .
[ ]+$ - .
( ) + , $2.

replace $1$2 <code> , , , .

:

$1 $2 , .
(a|b|c) - , . ^ +| +$ ( ) +.

: http://ideone.com/HxbaV

Condition inside regex pattern

More articles: