StackOverflowException

Topics: Developer Forum, Project Management Forum, User Forum
Jul 22, 2010 at 12:05 PM
Edited Jul 22, 2010 at 12:08 PM

Hi, I have an application that uses the HtmAgilityPack to parse, then save various HTML pages. I've stumbled across the following url which causes a StackoverflowException. The page will render fine in a broswer: any suggestions as to why this occurs and how I can handle the situation, other than not loading this site? This type of exception cannot be caught with a try/catch.

Pasting the following into a Console app will recreate the error...

-----------------------------------------------------------------------------------------------------------------------------------------------------------

String url = http://rewarding.me/active-tel-domains/index.php/index.php?rescan=amour.tel&w=A&url=&by=us&limits=0";
WebRequest request = System.Net.HttpWebRequest.Create(url);
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.Load(( request.GetResponse() ).GetResponseStream());
Stream memoryStream = new MemoryStream();
htmlDocument.Save(memoryStream);


-----------------------------------------------------------------------------------------------------------------------------------------------------------

Regards,

AArnie

Jul 22, 2010 at 12:42 PM
this is a known problem in the save method. It needs to be re-written to get the C# compiler to output a MSIL "tail" call to avoid this. The reason this is happening is because the save method recursively steps through the entire node tree and eventually on larger pages hits the max call stack limit in the CLR.
Jul 22, 2010 at 12:48 PM

Thanks for the quick response.

Are you aware of a reliable way of knowing whether or not the call stack limit is being approached, or of an HtmlAgility object property that would be suitable to monitor, to avoid hitting the limit?

Regards,

AArnie

Jul 22, 2010 at 12:51 PM
Right now I don't know of a way within .NET to check the current stack without majorly affecting performance. The limit changes based on the CLR and the working environment. I believe that 64-bit has a higher limit.
Aug 9, 2010 at 3:38 PM

so this is not a infinite loop? can I just increase the max stack size as a work-around? Thanks.

Sep 20, 2010 at 9:20 AM

As a work-around, how would one increase the stack size? I know you can do it on a specific thread, but I think it's still limited to the default max anyway.

@Darth: Do we have a test that proves this problem so that we can solve it?

Sep 25, 2010 at 12:02 PM

Hi @Darth,

The code example I posted at the top of this thread will cause the issue to occur. I have since found another example, but have filed somewhere that's not to hand at the moment. I'll hunt it out and post it asap.

Regards,

AArnie

Oct 5, 2010 at 11:31 PM
Edited Oct 5, 2010 at 11:33 PM

Hello,

I did a quick workaround for this:

http://devva.net/blog/post/Workaround-for-the-HtmlAgilityPack-StackOverflowException-bug.aspx

Jan 28, 2011 at 3:10 PM

Why can't HtmlAgilityPack use a Stack (as in an instance of the Stack class) of its own to store the elements to be written out? That's how you would normally get rid of recursion and it seems the right way to go, otherwise you will always be limited by the stack size (1MB by default).

Feb 2, 2011 at 3:09 PM

This is making HAP unusuable for me. I can't get it to save even the most simple XHTML file after modifying an element's attributes. The workaround doesn't help because it just makes the exception trappable but doesn't allow me to save.

Nov 18, 2013 at 12:56 PM
it seem it still not solved nowadays at 1.46.