How to get Innertexts from multiple <a> tags?
This is my sample page. I want to get all inner tag texts to one line. I wrote code for this, but it doesn’t work correctly
<body>
<div id="infor">
<div id="genres">
<a href="#" >Animation</a>
<a href="#" >Short</a>
<a href="#" >Action</a>
</div>
</div>
</body>
I want to get the internal text of the All tag in one line, I used this code for this, but it does not work correctly.
class Values
{
private HtmlAgilityPack.HtmlDocument _markup;
HtmlWeb web = new HtmlWeb(); //creating object of HtmlWeb
form1 frm = new form1;
_markup = web.Load("mypage.html"); // load page
public string Genres
{
get
{
HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a"); // I filter all of <a> tags in <div id="infor">
if (headers != null)
{
string genres = "";
foreach (HtmlNode header in headers) // I'm not sure what happens here.
{
HtmlNode genre = header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]"); //I think an error occurred in here...
if (genre != null)
{
genres += genre.InnerText + ", ";
}
}
return genres;
}
return String.Empty;
}
}
frm.text1.text=Genres;
}
text1 (return value):
Animation, Animation, Animation,
But I need the output as follows:
Animation, Short, Action,
+3
2 answers
Little Linq and the use of Descendants, I think, will become easier.
var genreNode = _markup.DocumentNode.Descendants("div").Where(n => n.Id.Equals("genre")).FirstOrDefault();
if (genreNode != null)
{
// this pulls all <a> nodes under the genre div and pops their inner text into an array
// then joins that array using the ", " as separator.
return string.Join(", ", genreNode.Descendants("a")
.Where(n => n.GetAttributeValue("href", string.Empty).Equals("#"))
.Select(n => n.InnerText).ToArray());
}
+1
, - header.ParentNode.SelectSingleNode(".//a[contains(@href, '#')]"). div, a, ( ). a node, , . , , , , :
HtmlNodeCollection headers = _markup.DocumentNode.SelectNodes("//div[contains(@id, 'infor')]/a[contains(@href, '#')]");
if (headers != null)
{
string genres = "";
foreach (HtmlNode header in headers) // i not sure what happens here.
{
genres += header.InnerText + ", ";
}
return genres;
}
+1