This project has moved and is read-only. For the latest updates, please go here.

Retrieving Attribute Nodes

Topics: Developer Forum
Oct 5, 2006 at 9:20 PM

First of all: The HTML Agility Pack is great. Finally I'm able to parse information out of usual HTML Files... :)

But there might be a bug in SelectSingleNode/SelectNodes:

for instance should return alle HREF-Attributes in the document (if I'm not mistaken)

But what you get are a lot of Element-Nodes CONTAINING HREF-Attributes (LINK and A) but not the Attribute-Nodes itself.


Nov 9, 2006 at 11:27 AM
This is a design issue (some call it bug :-)

The pack does not support attribute selection.

If you take a look at the internal implementation, attributes and nodes are handled differently (they don't derive from a commmon node), and the HtmlNavigator class is not capable of returning a common object for node or attribute selection.

This can be fixed, but requires some coding...
Feb 19, 2007 at 3:30 PM

I've encountered the same "bug"/feature.

Being able to select attribute nodes would be very nice indeed.

Feb 19, 2007 at 4:06 PM
Since when is an Attribute a node?
A node HAS attributes.
You create a node TEXT/TAG and you create an Attribute. You "associate" and attribute with a node (attach/add). As far as I can see an attribute is an object that is a name/value pair that can be added to a node. It can be copied from node to node or moved to a new node.

Yes it might be useful to get a collection of attributes to work on in some way but the query seems to be designed to return a collection of mathing node objects which contain that attribute. If you think of it in this way then it is completely manageble.
Feb 20, 2007 at 8:54 AM
Well, the problem is that it's not XPATH compliant.

If I want to query a HTML document using some XPATH expression, I don't want to have to parse the expression myself to see whether I should look into the attributes of the resulting nodes.

If I have a piece of code that that takes an XPATH expression as an argument to extract some value from an HTML document, I have to examine that argument to see if it ultimately selects attribute values or not.
The input might be "//foo/bar", or it might be "//foo/@bar" or it might even be "//foo@baz=42/bar/@foz" or something even more complicated.
Problem is, I don't know at runtime what the query is and what type nodes I'm querying. In XPATH it's all good, but the lacking support of selecting values directly from attributes using XPATH expressions in HTMLAgilityPack forces me to make some ad hoc parsing of the XPATH query to determine which attributes to extract after retrieving nodes with a query.
So the issue is not really whether an attribute is a node or not. It's that queries doesn't do what you expect them to, when you select attributes using XPATH expressions.