This project has moved and is read-only. For the latest updates, please go here.
1
Vote

<a> tag is being ignored by HtmlAgilityPack

description

Hi there,

I have a HTML document that I'm trying to parse using HTML Agility Pack. I then use the XPath Navigator to get the branches that I want to access, once the document is loaded in the DOM. All seems fine, it even bypasses JavaScript in the HTML, but HtmlAgilityPack seems to completely ignore <a> tags. I tried running the debugger in Visual Studio to display the XPath property of an element in the page, and sure enough the XPath property is missing the <a> tag.

Please advice, this is quite urgent as I've been stuck on this issue for two days now.

Regards,
Jo

comments

bergie wrote Feb 20, 2015 at 5:35 PM

Please post the code you are using to retrieve the a tags, I get the a tags from html documents using agility pack all the time.

you can use HtmlNodeCollection htmlNodes = htmlDocument.DocumentNode.SelectNodes("//a");

and if you want the href values you can use

HtmlNodeCollection htmlNodes = htmlDocument.DocumentNode.SelectNodes("//a[@href]");