I use HtmlAgilityPack to parse approximately 200,000 HTML documents.
I cannot predict the contents of these documents, however one such document makes my application fail with StackOverflowException. The document contains this HTML:
<ol>
<li><li><li><li><li><li>...
</ol>
It has approximately 10,000 <li>elements. Due to the way HtmlAgilityPack parses HTML, it calls StackOverflowException.
Unfortunately, a StackOverflowException is not perceptible in .NET 2.0 and later.
I really wondered about setting a larger size for the stream stack, but setting a larger stack is a hack: it will make my program use a lot more memory (my program runs about 50 threads for HTML processing, so all of these threads will have an increased stack size ) and manual tuning is required if it again encounters a similar situation.
Can I use other workarounds?
source
share