i am new to this library and i chose it for its LINQ support. thanks for making it open source!
i did some initial tests and what puzzles me is that some nodes like <TITLE> and <STYLE> return the inner text as child nodes.
using( StreamWriter sw = File.CreateText("c:\\out.txt") )
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load( @"C:\Temp\index.html" );
var results = from node in doc.DocumentNode.Descendants()
// where node.HasChildNodes == false
foreach( HtmlNode node in results )
// if( !node.HasChildNodes )
sw.WriteLine( node.OuterHtml );
sw.WriteLine( "++++++++++++++++++++++++++++++" );
as you can see, i simply take all nodes and their descendants and write them to a text file.
however, a node like
yields actually two nodes: the original line above and M$$ as child node
if i uncomment "if( !node.HasChildNodes )" i only see M$$. the same goes for the <STYLE> node in the sample mshome.htm
this seems wrong to me, or am i missing something?