Using LINQ to exclude all hidden inputs from HTML using C #

Hello, I have the following HTML from web client code

<body onLoad="window.focus()">
<form name="form1" method="post" action="/www/www.do">
<input type="hidden" name="value1" value="aaaa">
<input type="hidden" name="value2" value="bbbb">
<input type="hidden" name="value3" value="cccc">
<input type="hidden" name="value4" value="dddd">
<input type="hidden" name="value5" value="eeee">

more html.....

</body>

How can I extract all the names and values ​​that are hidden in the input type using C # linq or string functios?

+3
source share
1 answer

Using HtmlAgilityPack, you can do the following:

var doc = new HtmlWeb().Load("http://www.mywebsite.com");
var nodes = doc.DocumentNode.SelectNodes("//input[@type='hidden' and @name and @value]");
foreach (var node in nodes) {
    var inputName = node.Attributes["name"].Value;
    var inputValue = node.Attributes["value"].Value;
    Console.WriteLine("Name: {0}, Value: {1}", inputName, inputValue);
}

If you want to load a document from a text file, and not from a URL, you can do:

var doc = new HtmlDocument();
doc.Load(@"C:\file.html");

If you still want to use LINQ for this purpose, as it SelectNodesreturns HtmlNodeCollectionwhich is IEnumerable<Node>, you can do:

var query = from f in doc.DocumentNode.DescendantNodes()
            where f.Name == "input" && f.GetAttributeValue("type", "") != ""
                    && f.Attributes.Contains("name") && f.Attributes.Contains("value")
            select new
                        {
                            f.Attributes["value"].Value,
                            f.Attributes["name"].Name
                        };

foreach (var q in query) {
    Console.WriteLine("Name: {0}, Value: {1}", q.Name, q.Value);
}
+2
source

All Articles