Nov 30, 2009 at 10:36 AM
Edited Nov 30, 2009 at 10:37 AM
I'm parsing some pages that have a <UL> list of items and specifically a <LI> item, on some of the pages the <LI> has 3 <SPAN> tags and in others it has 2 <SPAN> tags and one <A> in the place of the third <SPAN>
So I figure I'd just use the HtmlNode.Descendants().ToList() without any string parameter and get the third item from the list. Problem is this returns 10 items! And the extra items are actually \n and \t that are in the raw html:
<span>Nov 3, 2009</span> </span>
Think of that but a little messed up to us humans. So my question is, is this by design or a bug? And how can I work around it.