I noticed you released a new version of this – I’d like to be able to use it but I’ve customized it too much from the original version I downloaded. Some of my changes were a bit of a kludge for my own fairly specific needs (like
having each node remember what the last attribute accessed was), but one change I think you should seriously consider is adding a System.Xml.Xsl.XsltContext parameter to the SelectNodes and SelectSingleNode functions, that can be passed to the underlying HtmlNodeNavigator.Select()
function. This enables adding custom XPath functions etc. I also made sure that attribute value matches were reliably case insensitive, so that something like
doc.DocumentNode.SelectSingleNode("//input[@type='text']") would work even if the HTML was all uppercase
Oh, and I added a public accessor for HtmlDocument._text.
I also JUST managed to fix a bug that I’d started to explain in this message as a long-standing problem that I’d only been able to work around by setting OptionAutoCloseOnEnd for the documents that needed it, but in the process of trying to distill
a better example for you I actually finally worked out how to fix it properly.
Even with the latest version the following code causes an exception to be thrown:
HtmlDocument doc = new HtmlDocument();
Console.WriteLine(doc.DocumentNode.SelectSingleNode("//tr").InnerHtml); // <- crashes with “Length cannot be less than 0”
This is because when it’s trying to automatically close unclosed child nodes in Node.CloseNode() nodes, it creates a “fake” closer node with an _outerstartindex of -1, but this causes the child node’s _innerlength property
to be invalid (indeed, negative), because it’s calculated by subtracting the current
_innerstartindex value from the fake node’s -1 _outerstartindex value. The _outerlength property is similarly negative, and this can cause both the InnerHtml and OuterHtml accessors to crash – unless of
course _innerchanged or _outerchanged is true, which is exactly the fix I’ve made to Node.CloseNode(). It now checks before calculating the new _innerlength and _outerlength whether the endnode’s _outerstartindex is -1, and if so simply always
sets _innerchanged and _outerchanged to true:
if (endnode._outerstartindex < 0)
_innerchanged = _outerchanged =
// create an inner section
This seems to be the right solution, because if there are child nodes that are unclosed when the parent is being closed, the HTML should be regenerated anyway.