We have an absolutely massive help document created in Word, and this was used to create an even more massive and fatal HTM document. Using C # and this library, I only want to capture and display one section of this file at any time in my application. Sections are divided as follows:
<div>
<h1><span style='mso-spacerun:yes'></span><a name="_Toc325456104">Section A</a></h1>
</div>
<div> Lots of unnecessary markup for simple formatting... </div>
.....
<div>
<h1><span style='mso-spacerun:yes'></span><a name="_Toc325456104">Section B</a></h1>
</div>
Logically speaking, it exists H1with the section name in the tag a. I want to select everything from the external containing the div, until I run into another H1and exclude that the div.
- Each section name is in a tag
<a>under H1, which has several children (about 6) - The logical section is marked with comments.
:
var startNode = helpDocument.DocumentNode.SelectSingleNode("//h1/a[contains(., '"+sectionName+"')]");
startNode=startNode.ParentNode;
int startNodeIndex = startNode.ParentNode.ChildNodes.IndexOf(startNode);
var endNode =?;
int endNodeIndex = endNode.ParentNode.ChildNodes.IndexOf(endNode);
var nodes = startNode.ParentNode.ChildNodes.Where((n, index) => index >= startNodeIndex && index <= endNodeIndex).Select(n => n);
, , node h1. .