Removing specific html tags but not the inner text.

Nov 16, 2009 at 4:21 PM
Edited Nov 16, 2009 at 5:08 PM


I know how to remove specific html nodes from a given piece of htm fragment. Eg  for <p><span>Hello</hello></p> I would write the following code: 

            HtmlNodeCollection nc = doc.DocumentNode.SelectNodes("//span");
            if (nc != null)
                foreach (HtmlNode node in nc)
            return doc.DocumentNode.WriteTo();

This removes the entire node from the doc including the text "Hello" . I want to keep the text so that my html ends up looking like <p>Hello</p>

Using a more complex example . If I have the following html <p><span>Hello <strong> WORLD </strong </span> </p>

I would like to end up with <p>Hello<strong>WORLD</strong> </p> .

Running the above code leaves me with <p></p>

Can this be achieved using the Html Agility Pack API or do I need to use regular expressions.



Nov 29, 2009 at 12:53 PM


Are you trying to remove the empty span tag, or trying to remove a specific set of tags from properly formatted html? If it's to remove tags then some regular expressions might do the trick.