This project has moved and is read-only. For the latest updates, please go here.
14
Vote

RemoveChild(node, true) reverses the order of the grandchildren it keeps

description

Here's a test that exercises the behavior:
 
    [TestMethod]
    public void TestMethod1()
    {
        var doc = new HtmlDocument();
        doc.LoadHtml("<div><p>Zero</p><code><p>One</p><p>Two</p></code><p>Three</p></div>");
 
        var div = doc.DocumentNode.SelectSingleNode("//div");
        var code = doc.DocumentNode.SelectSingleNode("//code");
 
        div.RemoveChild(code, true);
 
        Assert.AreEqual("<div><p>Zero</p><p>One</p><p>Two</p><p>Three</p></div>", Html(doc));
    }
 
Expected: "<div><p>Zero</p><p>One</p><p>Two</p><p>Three</p></div>"
Actual: "<div><p>Zero</p><p>Two</p><p>One</p><p>Three</p></div>"

file attachments

comments

Aaron_Maenpaa wrote Sep 1, 2010 at 11:57 PM

I attached a patch to fix the behavior.

wrote Nov 8, 2010 at 8:33 PM

xmen wrote Nov 11, 2010 at 1:05 PM

You question is wrong itself
doc.LoadHtml("<div><p>Zero</p><code​><p>One</p><p>Two</p></code><p>Thre​e</p></div>");
var div = doc.DocumentNode.SelectSingleNode("​//div");
var code = doc.DocumentNode.SelectSingleNode("​//code");

div.RemoveChild(code, true);

Expected: "<div><p>Zero</p><p>One</p><p>Two</​p><p>Three</p></div>"
Actual: "<div><p>Zero</p><p>Two</p><p>One</​p><p>Three</p></div>"

Expected should be
<div><p>Zero</p><p>Thre​e</p></div>

Aaron_Maenpaa wrote Nov 11, 2010 at 3:52 PM

@xmen

The true in div.RemoveChild(code, true) is to instruct it to keep the removed node's children.

So, unless I'm mistaken the expected result is that <p>One</p> and <p>Two</p> will be preserved and become the children of the <div>.

xmen wrote Nov 11, 2010 at 5:04 PM

Ah...never used that method :D just read its name

tbone47 wrote Nov 18, 2010 at 8:41 PM

wrote Nov 18, 2010 at 8:42 PM

sysmo wrote Jan 17, 2011 at 1:30 AM

The "prev" reference is unnecessary. Looping backwards through the children and inserting after the parent works. Here is the updated RemoveChild method.
    public HtmlNode RemoveChild(HtmlNode oldChild, bool keepGrandChildren)
    {
        if (oldChild == null)
        {
            throw new ArgumentNullException("oldChild");
        }

        if ((oldChild._childnodes != null) && keepGrandChildren)
        {
            // get prev sibling
            //HtmlNode prev = oldChild.PreviousSibling;
            int count = oldChild._childnodes.Count;

            // reroute grand children to ourselves
            for (var i = count - 1; i >= 0; i--)
            {
                //looping through children backwards allows us to insert after the parent 
                //and end up with the same node order
                HtmlNode grandchild = oldChild._childnodes[i];
                //InsertAfter(grandchild, prev);
                InsertAfter(grandchild, oldChild);
            }
        }
        RemoveChild(oldChild);
        _outerchanged = true;
        _innerchanged = true;
        return oldChild;
    }

wrote Feb 4, 2011 at 2:21 AM

wrote Oct 20, 2012 at 1:05 PM

wrote Dec 14, 2012 at 12:34 PM

wrote Feb 22, 2013 at 2:47 AM

wrote Aug 10, 2013 at 11:44 AM

wrote Oct 29, 2013 at 10:54 PM

wrote Mar 19, 2014 at 7:13 PM

wrote Dec 4, 2014 at 10:24 AM

wrote Jan 14, 2015 at 3:12 PM

JeffBlankenbiller wrote Jan 14, 2015 at 3:16 PM

Sysmo posted code to fix this problem 4 years ago.

Why has this not been corrected yet?

Why is this also a Low impact? Reordering the child nodes incorrect is bad. When used to strip unwanted HTML from paragraphs and sentences, it reorders the sentences or words which totally messes up the original content.

wrote Jul 7, 2015 at 10:09 AM

re4pz wrote Jan 7, 2016 at 3:01 PM

Please fix this.

wrote Jan 7, 2016 at 3:04 PM

wrote Sep 6, 2016 at 11:41 AM

johnnyoshika wrote Jan 10 at 10:25 AM

It would be really nice if the patch can be merged. We're similarly impacted by this bug as everyone else on this thread.