Apply wordwrap to html content excluding html attributes

Question

Apply wordwrap to html content excluding html attributes

I'm not used to regular expressions, so it may seem easy, albeit difficult for me.

Basically, I use wordwrap for content that contains classic html:, ... tags

  $text = wordwrap($text, $cutLength, " ", $wordCut);
  $text = nl2br(bbcode_parser($text));
  return $text;

As you can see, my problem is quite simple: all I want to do is apply wordwrap () to my content, excluding what might be in the html attributes: href, src ...

Can anyone help me out? Many thanks!

0

php regex html-parsing word-wrap

pixelboy Jan 13 '11 at 17:04

source share

2 answers

DOM, . , wordwrap .

,

URL URL- HTML?

, wordwrap.

: " () HTML "

+3

Gordon 13 . '11 17:12

sln · Accepted Answer · 2011-01-13T21:36:59+0000

You should not use regex to parse html, of course, but this should separate the content you want. I have limited php knowledge, so this just illustrates the procedure.

$tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

$scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

$regex = / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg;

- Group1, ( , Group2) - : replacement =\1. textwrap (\ 2)
textwrap , .

Perl (btw ):

use strict;
use warnings;

my $tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

my $scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

my $html = join '', <DATA>;

while ( $html =~ / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg ) {
    if (defined $2 && $2 !~ /^\s+$/) {
        print $2,"\n";
    }
}

Apply wordwrap to html content excluding html attributes

More articles: