This project has moved and is read-only. For the latest updates, please go here.
7
Vote

Stackoverflow with HtmlNode.OuterHtml

description

There seems to be a problem with HtmlNode.OuterHtml writing to itself until a StackOverflow occurs. The repo is simple:
 
public class Test
{
public void Run(string html)
{
    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);
 
    RemoveAttributes(document.DocumentNode);
    document.Save(@"out.html");
}
 
private void RemoveAttributes(HtmlNode parentNode)
{
    for (int i = 0; i < parentNode.ChildNodes.Count; i++)
    {
        if (parentNode.ChildNodes[i].Name.ToLower() != "span")
        {
            parentNode.ChildNodes[i].Attributes.RemoveAll();
        }
 
        if (parentNode.ChildNodes[i].ChildNodes.Count > 0)
            RemoveAttributes(parentNode.ChildNodes[i]);
    }
}
}
 
The fix is a simple change to the OuterHtml property so _outerChanged is set to false first. I've uploaded a patch for the fix.

comments

elendil326 wrote Aug 14, 2014 at 5:42 PM

A work around is to avoid triggering the change of the outer html of a HtmlNode of NodeType text as mentioned in https://htmlagilitypack.codeplex.com/workitem/35461