Sep 19, 2012 at 3:59 PM

Hi, I am using HtmlAgilityPack to process my HTML input. But I am getting unwanted closure tags when "LoadHtml" method is called. 

Suppose Text input is an explanation about "c# nullable types" and has a line like the one below;

Nullable<int> c = null;

When the above line is loaded to an HtmlDocument using LoadHtml, it becomes something like below;

Nullable<int> c = null; </int>

How can I avoid this? I have tried OptionAutoCloseOnEnd=false, OptionFixNestedTags=false, OptionWriteEmptyNodes=false, OptionCheckSyntax=false none of them worked.

Any help greatly appreciated...

Nov 22, 2012 at 9:46 AM
Edited Nov 22, 2012 at 9:52 AM

The HtmlAgilityPack is relatively new to myself but logically 'OptionsCheckSyntax=false' would work. The problem you seem to be running into is your use of '<' and '>' characters.

These characters are used as mark-up for the document itself. Try replacing them with "& l t ;" and "& g t ;" (without the quotes and spaces)

e.g. <extract> Nullable & l t ;int & g t ; c = null; </extract>

Haven't had the opportunity to try this myself but by all indications it should work. Added spaces as browsers automatically convert them to '<' and '>'.

Reference: http://www.w3.org/TR/REC-html40/charset.html#h-5.3.2