InnerText returns HTML entities rather than their raw character


I use the HtmlAgilityPack 14.6 NuGet package, and I noticed that when text containing HTML entities is parsed into an HtmlDocument, or set through HtmlNode.InnerHtml, then later retrieved via HtmlNode.InnerText, these entities are returned verbatim. I would expect the entities to be resolved to the characters they represent.

To reproduce, see the attached test case for details. Create an empty C# class library project, install NUnit and HtmlAgilityPack from NuGet, then paste the attached code in. Two of the three tests fail because of the behavior described above.

dandreica wrote Feb 5, 2013 at 5:40 PM

The reverse is also broken, i.e. reading InnerHtml off of an HtmlTextNode also returns incorrect result. See updated test case.

a_h wrote Mar 13, 2014 at 3:15 PM

Quick workaround for some cases can be to use the System.Web.HttpUtility class:

e.g.: HttpUtility.HtmlDecode(node.InnerText)

