Using regex to retrieve text between multiple HTML tags

Using a regex, I want to be able to get text between multiple DIV tags. For example, the following:

<div>first html tag</div>
<div>another tag</div>

Output:

first html tag
another tag

I use the regex pattern only for my last div tag and skip the first. The code:

    static void Main(string[] args)
    {
        string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
        string pattern = "(<div.*>)(.*)(<\\/div>)";

        MatchCollection matches = Regex.Matches(input, pattern);
        Console.WriteLine("Matches found: {0}", matches.Count);

        if (matches.Count > 0)
            foreach (Match m in matches)
                Console.WriteLine("Inner DIV: {0}", m.Groups[2]);

        Console.ReadLine();
    }

Conclusion:

Matches found: 1

Internal DIV: this is ANOTHER test

+5
source share
6 answers

Replace the template with an incorrect match

static void Main(string[] args)
{
    string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
    string pattern = "<div.*?>(.*?)<\\/div>";

    MatchCollection matches = Regex.Matches(input, pattern);
    Console.WriteLine("Matches found: {0}", matches.Count);

    if (matches.Count > 0)
        foreach (Match m in matches)
            Console.WriteLine("Inner DIV: {0}", m.Groups[1]);

    Console.ReadLine();
}
+9
source

Since the other guys didn't mention HTML tags with attributes, here is my solution to solve this problem:

// <TAG(.*?)>(.*?)</TAG>
// Example
var regex = new System.Text.RegularExpressions.Regex("<h1(.*?)>(.*?)</h1>");
var m = regex.Match("Hello <h1 style='color: red;'>World</h1> !!");
Console.Write(m.Groups[2].Value); // will print -> World
+7
source

, HTML ( "\n" ), , , .

-, :

((<div.*>)(.*)(<\\/div>))+ //This Regex will look for any amount of div tags, but it must see at least one div tag.

((<div.*>)(.*)(<\\/div>))* //This regex will look for any amount of div tags, and it will not complain if there are no results at all.

:

http://www.regular-expressions.info/reference.html

http://www.regular-expressions.info/refadv.html

Mayman

+1

Html Agility Pack (. fooobar.com/questions/337090/...)?

CsQuery ( CSS ). . fooobar.com/questions/90838/....

CsQuery "jQuery #", , .

-, jQuery, , $("div").each(function(idx){ alert( idx + ": " + $(this).text()); } ( , , -, ).

+1

, :

string htmlSource = "<div>first html tag</div><div>another tag</div>";
string pattern = @"<div[^>]*?>(.*?)</div>";
MatchCollection matches = Regex.Matches(htmlSource, pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
ArrayList l = new ArrayList();
foreach (Match match in matches)
 {
   l.Add(match.Groups[1].Value);
 }
+1

, . HTML, .

, HTML - , , .

- , div?

<div><div>stuff</div><div>stuff2</div></div>

, , :

<div><div>stuff</div>
<div>stuff</div>
<div>stuff</div><div>stuff2</div>
<div>stuff</div><div>stuff2</div></div>
<div>stuff2</div>
<div>stuff2</div></div>

, , HTML.

, , , . HTML, , .

Additional information: fooobar.com/questions/424 / ...

0
source