Regular expressions: where angels fear the tread

I just started learning regular expressions in PHP, but I have a terrible time following some tutorials on the WWW and don't seem to find anything that addresses my current needs. Perhaps I have learned too much. This aspect of PHP is completely irrelevant to me.

What I'm trying to create is a regular expression to replace all HTML code between the nth occurrence of <TAG> and </TAG> with any code that I select.

My ultimate goal is to create an Internet filter in PHP through which I can view a web page devoid of certain content (or replaced by disinfected content) between any specified set of <TAG> ... </TAG> tags; inside the page where <TAG> ... </TAG> represents any valid paired HTML tags, such as <B> ... </B> or <SPAN> ... </SPAN> or <DIV> .. . </DIV>, etc. etc.

For example, if the page has a porn ad contained in the 5th & Lt; DIV & GT; ... </DIV> the block inside the page, which regular expression can be called for targeting and replace this code with something else, for example xxxxxxx, but only the 5th and DIV-gt; block inside the page and nothing more?

The entire web page is contained in one text line, and the filtered result should also be a separate line of text.

I'm not sure, but I think the code for this may have a format similar to:

$FilteredPage = preg_replace("REG EXPRESSION", "xxxxxxxx", $OriginalPage);

The “REG EXPRESSION” for the call is what I need to know, and “xxxxxxxx” represents the text to replace the code between the tags aimed at the “REG EXPRESSION”.

Regular expressions are obviously Satan's work!

Any general suggestions or perhaps a few working examples that I could study and experiment are welcome.

Thanks Jay

0
source share
3 answers

, , , HTML. , ... , , . HTML XML

xpath , html, ... phpQuery QueryPath

, HTML :

Html , html. HTML , , .

: @ , , , upvotes!

RegEx, XHTML,

+3

-, ? Regex - , - , HTML .

-, - , . , :

() "".

, PHP, 5 5- . , PHP ?

5- HTML ( , < > )

, , .

+3

ok, .

  • Do not send such a question, pre-asking the whole question will keep people
  • Awsome regular expressions!
  • If you want to consider options, see how to read html as an XML document and parse it with xpath
  • @tobyodavies is pretty much true, I will include an answer in case you want to do it anyway

Now, to your problem. With the help of this:

$regex = "#<div>(.+?)</div>#si";

You should be fine using this expression and counting intros, something like this:

preg_match_all($regex, $htmlcontent, $matches, PREG_SET_ORDER );

Suppose you only need a fifth. Matches [$ i] [0] - the entire match string $ i-eth

if (count($matches) > 5 )
{
   $myMatch = $matches[5][0];
   $matchedText = $matches[5][1];
}

Good luck with your efforts ...

0
source

All Articles