18
Vote

Ordering problem with RemoveChild

description

When I remove an item from a parent node, the ordering of the children changes:
 
The following example replaces the less-than and greater-than signs with square brackets. If you comment out the RemoveChild() line the ordering works ok:
 
        String testhtml = "<levelone>\r\n"
            + "   <leveltwo>\r\n"
            + "      <h1><levelthreepartone></levelthreepartone></h1>\r\n"
            + "      <p><levelthreeparttwo></levelthreeparttwo></p>\r\n"
            + "   </leveltwo>\r\n"
            + "</levelone>";
 
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(testhtml);
 
        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//*");
 
        foreach (HtmlNode node in nodes)
        {
            HtmlNode parentnode = node.ParentNode;
 
            HtmlTextNode textnode1 = node.OwnerDocument.CreateTextNode("["+node.Name+"]");
            parentnode.InsertBefore(textnode1, node);
 
            HtmlTextNode textnode2 = node.OwnerDocument.CreateTextNode("[/" + node.Name + "]");
            parentnode.InsertAfter(textnode2, node);
 
            parentnode.RemoveChild(node, true);  // comment this out!
 
        }
        StringBuilder sb = new StringBuilder();
        StringWriter sw = new StringWriter(sb);
        HtmlTextWriter writer = new HtmlTextWriter(sw);
        doc.Save(writer);
        System.Console.Write(sb.ToString());
 
Note that the lines with "levelthreepartone" and "levelthreeparttwo" get reversed.

comments

mikebridge wrote Mar 23, 2007 at 6:55 PM

I think the problem is in HtmlNode.RemoveChild(HtmlNode oldChild, bool keepGrandChildren). It looks like the previous sibling is assigned only once, then everything gets inserted at that point. This causes the order of the nodes to be reversed.

The current code is this:
        if ((oldChild._childnodes != null) && keepGrandChildren)            {               // get prev sibling             HtmlNode prev = oldChild.PreviousSibling;

            // reroute grand children to ourselves              foreach(HtmlNode grandchild in oldChild._childnodes)                {                   InsertAfter(grandchild, prev);              }           }           RemoveChild(oldChild);
I think it should be this:
  // get prev sibling      HtmlNode prev = oldChild.PreviousSibling;      if ((oldChild._childnodes != null) && keepGrandChildren)      {          // reroute grand children to ourselves          foreach (HtmlNode grandchild in oldChild.ChildNodes)          {            prev = InsertAfter(grandchild, prev);          }

  }

adamface wrote Jul 20, 2009 at 9:34 PM

Mike - You are a lifesaver! I ran into the exact same problem (grandchildren nodes being reversed). I can verify that your code fixes the issue.

Thank you!

Adam

0tto wrote Aug 18, 2010 at 11:32 AM

Thanks mike, your fix worked for me as well.

I've put up fixed binaries here if somebody needs them:
http://www.mediafire.com/?t5eadu70nc79z9w

tbone47 wrote Nov 18, 2010 at 8:42 PM

sysmo wrote Jan 17, 2011 at 1:30 AM

The "prev" reference is unnecessary. Looping backwards through the children and inserting after the parent works. Here is the updated RemoveChild method.
    public HtmlNode RemoveChild(HtmlNode oldChild, bool keepGrandChildren)
    {
        if (oldChild == null)
        {
            throw new ArgumentNullException("oldChild");
        }

        if ((oldChild._childnodes != null) && keepGrandChildren)
        {
            // get prev sibling
            //HtmlNode prev = oldChild.PreviousSibling;
            int count = oldChild._childnodes.Count;

            // reroute grand children to ourselves
            for (var i = count - 1; i >= 0; i--)
            {
                //looping through children backwards allows us to insert after the parent 
                //and end up with the same node order
                HtmlNode grandchild = oldChild._childnodes[i];
                //InsertAfter(grandchild, prev);
                InsertAfter(grandchild, oldChild);
            }
        }
        RemoveChild(oldChild);
        _outerchanged = true;
        _innerchanged = true;
        return oldChild;
    }

tednyberg wrote May 3, 2011 at 7:40 AM

Here's an example I used in a project to remove span tags without compiling my own modified version of HtmlAgilityPack:
    public static void RemoveSpanTags(this HtmlDocument html)
    {
        var spanTags = html.DocumentNode.SelectNodes("//span");

        if (spanTags!=null)
        {
            foreach (var span in spanTags)
            {
                if (!span.HasChildNodes)
                {
                    span.ParentNode.RemoveChild(span);

                    continue;
                }

                int count = span.ChildNodes.Count;

                for (var i = count - 1; i >= 0; i--)
                {
                    var child = span.ChildNodes[i];

                    span.ParentNode.InsertAfter(child, span);
                }

                span.ParentNode.RemoveChild(span);
            }
        }
    }

jongalloway wrote Sep 10, 2011 at 11:38 PM

Any reason not to apply the patch for 28756?