Ordering problem with RemoveChild


When I remove an item from a parent node, the ordering of the children changes:
The following example replaces the less-than and greater-than signs with square brackets. If you comment out the RemoveChild() line the ordering works ok:
        String testhtml = "<levelone>\r\n"
            + "   <leveltwo>\r\n"
            + "      <h1><levelthreepartone></levelthreepartone></h1>\r\n"
            + "      <p><levelthreeparttwo></levelthreeparttwo></p>\r\n"
            + "   </leveltwo>\r\n"
            + "</levelone>";
        HtmlDocument doc = new HtmlDocument();
        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//*");
        foreach (HtmlNode node in nodes)
            HtmlNode parentnode = node.ParentNode;
            HtmlTextNode textnode1 = node.OwnerDocument.CreateTextNode("["+node.Name+"]");
            parentnode.InsertBefore(textnode1, node);
            HtmlTextNode textnode2 = node.OwnerDocument.CreateTextNode("[/" + node.Name + "]");
            parentnode.InsertAfter(textnode2, node);
            parentnode.RemoveChild(node, true);  // comment this out!
        StringBuilder sb = new StringBuilder();
        StringWriter sw = new StringWriter(sb);
        HtmlTextWriter writer = new HtmlTextWriter(sw);
Note that the lines with "levelthreepartone" and "levelthreeparttwo" get reversed.


mikebridge wrote Mar 23, 2007 at 5:55 PM

I think the problem is in HtmlNode.RemoveChild(HtmlNode oldChild, bool keepGrandChildren). It looks like the previous sibling is assigned only once, then everything gets inserted at that point. This causes the order of the nodes to be reversed.

The current code is this:
        if ((oldChild._childnodes != null) && keepGrandChildren)            {               // get prev sibling             HtmlNode prev = oldChild.PreviousSibling;

            // reroute grand children to ourselves              foreach(HtmlNode grandchild in oldChild._childnodes)                {                   InsertAfter(grandchild, prev);              }           }           RemoveChild(oldChild);
I think it should be this:
  // get prev sibling      HtmlNode prev = oldChild.PreviousSibling;      if ((oldChild._childnodes != null) && keepGrandChildren)      {          // reroute grand children to ourselves          foreach (HtmlNode grandchild in oldChild.ChildNodes)          {            prev = InsertAfter(grandchild, prev);          }


adamface wrote Jul 20, 2009 at 8:34 PM

Mike - You are a lifesaver! I ran into the exact same problem (grandchildren nodes being reversed). I can verify that your code fixes the issue.

Thank you!


0tto wrote Aug 18, 2010 at 10:32 AM

Thanks mike, your fix worked for me as well.

I've put up fixed binaries here if somebody needs them:

tbone47 wrote Nov 18, 2010 at 7:42 PM

sysmo wrote Jan 17, 2011 at 12:30 AM

The "prev" reference is unnecessary. Looping backwards through the children and inserting after the parent works. Here is the updated RemoveChild method.
    public HtmlNode RemoveChild(HtmlNode oldChild, bool keepGrandChildren)
        if (oldChild == null)
            throw new ArgumentNullException("oldChild");

        if ((oldChild._childnodes != null) && keepGrandChildren)
            // get prev sibling
            //HtmlNode prev = oldChild.PreviousSibling;
            int count = oldChild._childnodes.Count;

            // reroute grand children to ourselves
            for (var i = count - 1; i >= 0; i--)
                //looping through children backwards allows us to insert after the parent 
                //and end up with the same node order
                HtmlNode grandchild = oldChild._childnodes[i];
                //InsertAfter(grandchild, prev);
                InsertAfter(grandchild, oldChild);
        _outerchanged = true;
        _innerchanged = true;
        return oldChild;

tednyberg wrote May 3, 2011 at 6:40 AM

Here's an example I used in a project to remove span tags without compiling my own modified version of HtmlAgilityPack:
    public static void RemoveSpanTags(this HtmlDocument html)
        var spanTags = html.DocumentNode.SelectNodes("//span");

        if (spanTags!=null)
            foreach (var span in spanTags)
                if (!span.HasChildNodes)


                int count = span.ChildNodes.Count;

                for (var i = count - 1; i >= 0; i--)
                    var child = span.ChildNodes[i];

                    span.ParentNode.InsertAfter(child, span);


jongalloway wrote Sep 10, 2011 at 10:38 PM

Any reason not to apply the patch for 28756?

JeffBlankenbiller wrote Jan 14 at 2:17 PM

Sysmo posted code to fix this problem 4 years ago.

Why has this not been corrected yet?

Why is this also a Low impact? Reordering the child nodes incorrect is bad. When used to strip unwanted HTML from paragraphs and sentences, it reorders the sentences or words which totally messes up the original content.