Using the New Linq Syntax

Jun 19, 2010 at 5:13 PM

I am trying to use the new Linq to Objects syntax for working with HAP, but don't seem to be getting any items in my HtmlNodeCollections. In the below code, I am trying to parse and load some Html into an HtmlNodeCollection (I hate XPath and am very happy to be able to not have to use it!)

In the below code I can see that the HtmlDocument is succesfully parsed and contains tons of nodes. Yet the HtmlNodeCollection has zero items:

HtmlDocument doc = new HtmlDocument();
HtmlNodeCollection hnc = new HtmlNodeCollection(doc.DocumentNode);


Later on I intend to work with this node collection by doing things like this:

IEnumerable<HtmlNode> paragraphs = hnc.Where(p => p.Name.ToLower() == "p");


IEnumerable<HtmlAttribute> hrefs = hnc.Where(p => p.Attributes.Contains("href")).Select(p => p.Attributes.Single(q => q.Name == "href"));


But if the HtmlNodeCollection doesn't actually contain all the nodes... well, then, my plans are for naught.

Jun 19, 2010 at 6:17 PM

You'll need to use the new functions on like DescendantNodes() on the DocumentNode. Descendants and DescendantNodes both get you the entire collection of all nodes below the document node. I forgot to remove DescendantNodes and just leave Descendants. They both do the same thing, I will probably deprecate one of them in the next release

HmlDocument doc = new HtmlDocument();
IEnumerable<HtmlNode> paragraphs = doc.DocumentElement.DocumentNode.DescendantNodes().Where(p => p.Name.ToLower() == "p");


Jun 19, 2010 at 10:22 PM

Perfect. Thanks.