Regular expression for a specific tag

Question

Regular expression for a specific tag

I am working on a regular expression in a project .NETto get a specific tag. I would like to match the entire DIV tag and its contents:

<html>
   <head><title>Test</title></head>
   <body>
     <p>The first paragraph.</p>
     <div id='super_special'>
        <p>The Store paragraph</p>
     </div>
     </body>
  </head>

the code:

    Regex re = new Regex("(<div id='super_special'>.*?</div>)", RegexOptions.Multiline);


    if (re.IsMatch(test))
        Console.WriteLine("it matches");
    else
        Console.WriteLine("no match");

I want to match this:

<div id="super_special">
   <p>Anything could go in here...doesn't matter.  Let get it all</p>
</div>

I thought I .should have gotten all the characters, but it looks like he has problems with carriage return. What is my regex missing?

Thank.

+2

regex .net

Bullines Sep 17 '08 at 1:35

source share

11 answers

, , : HTML HTML. . , .

HTML - . , , , , Regexp, , .

, , Regexp , . , /m.

: HTML. , - Regexp HTML, ...

+6

Jörg W Mittag 17 . '08 1:43

, . , perl regex s:

m{<div id="super_special">.*?</span>}s

+1

mopoke 17 . '08 1:37

? .NET , , .

+1

Mitchel Sellers 17 . '08 1:37

. python re.S, ( ):

re.compile('<div id="super_special">.*?</div>',re.S).sub(your_html,'')

, "Single Line" "Multi Line" - .

REGEXPS TO PARSE HTML. . HTML, Beautiful Soup. .

+1

Vinko Vrsalovic 17 . '08 1:38

, . . . .NET RegexOptions.SingleLine , :

(?s)(<div id="super_special">.*?</div>)

+1

Bennor McCarthy 17 . '08 1:43

- . :

Java: Pattern.compile( "pattern", Pattern.MULTILINE);
Perl Ruby:/pattern/m
VB: Regex.IsMatch(s, "pattern", RegexOptions.Multiline)

regexp XML/HTML, XML/HTML , :

  <div id="super_special">
     <div>Nothing</div>
     <p>Anything could go in here...doesn't matter.  Let get it all</p>
  </div>

... :

  <div id="super_special">
     <div>Nothing</div>

, , HTML- , (, ).

+1

Mike Tunnicliffe 17 . '08 1:48

. () , \r \n. ., x ()

0

Nescio 17 . '08 1:38

:. [\ r\n]. [\ r\n]

0

dimarzionist 17 . '08 1:38

. , , </div> </div> , div, , .

, , HTML, , Microsoft , .NET., . .

0

Mike Kantor 17 . '08 2:41

Only regular expressions are simply not effective enough to solve your problem. You need something more powerful, such as context-free grammars. See Chom hierarchy on Wikipedia.

In other words (as mentioned earlier), do not use regex for parsing HTML.

0

Martijn 21 sept '08 at 10:51

source share

André Chalella · Accepted Answer · 2008-09-17T01:50:27+0000

Out of the box, without special modifiers, most regular expression implementations do not go beyond the end of the line to match the text. You should probably look in the documentation for the regex engine that you use for such a modifier.

I have one more tip: beware of greed! Traditionally regex , , , , :

<div id="super_special">
  I'm the wanted div!
</div>
<div id="not_special">
  I'm not wanted, but I've been caught too :(
</div>

"" , </div>, .

, , HTML . .

: - , , <div> ! HTML.

Regular expression for a specific tag

More articles: