Apply wordwrap to html content excluding html attributes

I'm not used to regular expressions, so it may seem easy, albeit difficult for me.

Basically, I use wordwrap for content that contains classic html:, ... tags

  $text = wordwrap($text, $cutLength, " ", $wordCut);
  $text = nl2br(bbcode_parser($text));
  return $text;

As you can see, my problem is quite simple: all I want to do is apply wordwrap () to my content, excluding what might be in the html attributes: href, src ...

Can anyone help me out? Many thanks!

0
source share
2 answers

You should not use regex to parse html, of course, but this should separate the content you want. I have limited php knowledge, so this just illustrates the procedure.

$tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

$scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

$regex = / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg;

- Group1, ( , Group2) - : replacement =\1. textwrap (\ 2)
textwrap , .

Perl (btw ):

use strict;
use warnings;

my $tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

my $scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

my $html = join '', <DATA>;

while ( $html =~ / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg ) {
    if (defined $2 && $2 !~ /^\s+$/) {
        print $2,"\n";
    }
}
+1

DOM, . , wordwrap .

,

, wordwrap.

: " () HTML "

+3

All Articles