XPath to select attribute

Topics: User Forum
Mar 9, 2010 at 4:24 PM

Some time ago there was similar discussion:

For example, I want to get all href links from page:

It is xpath compatible, but doesn't compatible with agilitypack. (It returns results of xpath //[@href])

Is there something new in this direction?

Maybe there is possible create more generic method
HtmlElementCollection SelectElements(string xpath);
where HtmlElement is base class for HtmlNode, HtmlText, HtmlAttribute, HtmlComment.

Jun 6, 2010 at 7:36 PM

I've run into this problem too. It's kind of annoying.

I don't even know what the xpath is going to be until runtime, so I essentially have to parse it myself and get the attribute instead of the node.

Jun 6, 2010 at 9:41 PM

This has been a long standing issue "feature". I did some testing and found we do indeed break from the way System.Xml.Xpath does it and thus I'm looking into how we can copy that. I have some code that is working with the example of //@href but I'm not sure how it will work with more complex attribute selection queries. I will need to build a rather comprehensive set of Xpath tests until I feel confident that it is the way to go. For now I will probably move this test code into a branch so development of HAPLight, HAPCompact and HAPDynamic (HAP for .NET 4.0) can continue

I would really appreciate examples and tests you may have. 

Jun 7, 2010 at 1:50 AM

Well I have some good news. I just uploaded an experimental binary with support for returning just HtmlAttributes when they are selected. You can download it at http://htmlagilitypack.codeplex.com/releases/view/46681

The source code is under the branches folder labeled as 2.0.Experimental. Please give this version a try and let me know how it works for you. I'd like a lot of feedback on this feature, particularly from people that are using it currently.

Jun 8, 2010 at 6:06 AM

Works. Pretty nice. Of course there is much work to clean code. For example, HtmlAttribute will never have any children, any attributes and etc. I will try this version more.


Thank You for your job. HtmlAgilityPack is my favorite html parsing component.

Jun 8, 2010 at 6:09 AM

Which is why on any of those interface members that the attribute will not have I return the appropriate response. False on HasChildren and so forth. I'm sure there may be some edge cases I missed in my first attempt. If you have any examples I can put into the unit tests, I'd love to have them.

Dec 1, 2010 at 2:16 PM
Edited Dec 1, 2010 at 2:46 PM

I haven't tested yet, pulling it down now, but will it support xpath 2.0 things like matches() replace() or tokenize()? I think thats built in to the System.Xml.Xpath but I'm not sure...

Will I be able to select a node or attribute based on regex?

Edit: found this stackoverflow: http://stackoverflow.com/questions/1525299/xpath-and-xslt-2-0-for-net which says xpath 2.0 isn't supported, but the reference page here http://msdn.microsoft.com/en-us/library/system.xml.xpath.aspx says it supports XQuery 1.0 and XPath 2.0 Data Model, but i guess that excludes XPath 2.0 functions and operators.


Oct 8, 2012 at 9:53 AM

So what's up now with HtmlNodeType.Attribute? In last versions of HAP it's still not implemented, and when I select node's attribute the whole element node is selected((

Oct 31, 2012 at 5:00 PM


Same question than RaTT:  is there plan to integrate that in the NuGet deliverable?

I have some generic code that take an arbitrary (.config) xpath expression, find the node and remove it.  I have to make some convoluted reg-ex and control flow in order to implement this right now...


Otherwise, quite a good lib!