Html Entities get html encoded?

Apr 26, 2011 at 3:20 PM

Hello there,

how come or how do I prevent HAP to "double" encodes html entities from html input .. basically what I am seeing is this:



var htmlDocument = new HtmlDocument { OptionOutputAsXml = true };

return htmlDocument.DocumentNode.InnerHtml;


Using this with 'Text' being:


I get this from htmlDocument.DocumentNode.InnerHtml back:



Same goes for all other entites. So basically whenever I pass in actually valid html, I get modifed, wrong html back. Is there any way to prevent this?



May 2, 2011 at 11:17 PM

I haven't looked too deeply into this, but it looks like something to do with the 'OptionOutputAsXml' you are setting - '&nbsp;' is not valid in XML. A possible workaround is to do this:


Passing in already decoded html results in the following output:

<HTML><HEAD></HEAD><body> </BODY></HTML>
May 11, 2011 at 8:17 PM

This is by design, XML doesn't allow '&' cahracter, and "escapes" it as &amp; destroying your HTML. You should not use output as XML.

Jan 10, 2015 at 1:30 AM
string noNbsp= Regex.Replace(inputHTML, @"&nbsp;", "").Trim();