C# - XML - Removing the nodes without using recursion

Topics: Developer Forum
May 22, 2011 at 12:15 AM

I have the following recursive method which takes the an XHTML document and marks nodes based on certain conditions and It is called like below for a number of HTML contents:-

    XmlDocument document = new XmlDocument();
PrepNodesForDeletion(document.DocumentElement, document.DocumentElement);

The method definition is below

/// <summary>
/// Recursive function to identify and mark all unnecessary nodes so that they can be removed from the document.
/// </summary>
/// <param name="nodeToCompareAgainst">The node that we are recursively comparing all of its descendant nodes against</param>
/// <param name="nodeInQuestion">The node whose children we are comparing against the "nodeToCompareAgainst" node</param>
static void PrepNodesForDeletion(XmlNode nodeToCompareAgainst, XmlNode nodeInQuestion)
if (infinityIndex++ > 100000)
foreach (XmlNode childNode in nodeInQuestion.ChildNodes)
// make sure we compare all of the childNodes descendants to the nodeToCompareAgainst
PrepNodesForDeletion(nodeToCompareAgainst, childNode);

if (AreNamesSame(nodeToCompareAgainst, childNode) && AllAttributesPresent(nodeToCompareAgainst, childNode))
// the function AnyAttributesWithDifferingValues assumes that all attributes are present between the two nodes
if (AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode) && InnerTextIsSame(nodeToCompareAgainst, childNode))
else if (!AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode))

// make sure we compare all of the childNodes descendants to the childNode
PrepNodesForDeletion(childNode, childNode);

And then the following method which would delete the marked node:-

 static void RemoveMarkedNodes(XmlDocument document)
// in order for us to make sure we remove everything we meant to remove, we need to do this in a while loop
// for instance, if the original xml is = <a><a><b><a/></b></a><a/></a>
// this should result in the xml being passed into this function as:
// <a><b><a DeleteNode="TRUE" /></b><a DeleteNode="TRUE"><b><a DeleteNode="TRUE" /></b></a><a DeleteNode="TRUE" /></a>
// then this function (without the while) will not delete the last <a/>, even though it is marked for deletion
// if we incorporate a while loop, then we can insure all nodes marked for deletion are removed
// TODO: understand the reason for this -- see http://groups.google.com/group/microsoft.public.dotnet.xml/browse_thread/thread/25df058a4efb5698/7dd0a8b71739216c?lnk=st&q=xmlnode+removechild+recursive&rnum=2&hl=en#7dd0a8b71739216c
XmlNodeList nodesToDelete = document.SelectNodes("//*[@DeleteNode='TRUE']");

while (nodesToDelete.Count > 0)
foreach (XmlNode nodeToDelete in nodesToDelete)

= document.SelectNodes("//*[@DeleteNode='TRUE']");

When I use the "PrepNodesForDeletion" method without the infinityIndex counter, I get OutOfMemoryException for few HTML contents. However, If I use infinityIndex counter, It may not be deleting nodes for some HTML contents.

Could you suggest anyway to remove recursion. Also I am not familiar with the HtmlAgility pack. So, If this can be done using that, could somebody provide some code sample.

Jun 10, 2011 at 11:20 AM

You can use System.Collections.Generic.Stack<T> to rewrite the method so it does not use recursion. You will still need to check for cycles in your xml tree. The easiest way to do this would be to check if current node exists in, before adding it to, stack.