XPath to select attribute

Topics: User Forum
Mar 9, 2010 at 5:24 PM

Some time ago there was similar discussion:
http://htmlagilitypack.codeplex.com/Thread/View.aspx?ThreadId=1720

For example, I want to get all href links from page:
//@href

It is xpath compatible, but doesn't compatible with agilitypack. (It returns results of xpath //[@href])

Is there something new in this direction?

Maybe there is possible create more generic method
HtmlElementCollection SelectElements(string xpath);
where HtmlElement is base class for HtmlNode, HtmlText, HtmlAttribute, HtmlComment.

Jun 6, 2010 at 8:36 PM

I've run into this problem too. It's kind of annoying.

I don't even know what the xpath is going to be until runtime, so I essentially have to parse it myself and get the attribute instead of the node.

Jun 6, 2010 at 10:41 PM

This has been a long standing issue "feature". I did some testing and found we do indeed break from the way System.Xml.Xpath does it and thus I'm looking into how we can copy that. I have some code that is working with the example of //@href but I'm not sure how it will work with more complex attribute selection queries. I will need to build a rather comprehensive set of Xpath tests until I feel confident that it is the way to go. For now I will probably move this test code into a branch so development of HAPLight, HAPCompact and HAPDynamic (HAP for .NET 4.0) can continue

I would really appreciate examples and tests you may have. 

Jun 7, 2010 at 2:50 AM

Well I have some good news. I just uploaded an experimental binary with support for returning just HtmlAttributes when they are selected. You can download it at http://htmlagilitypack.codeplex.com/releases/view/46681

The source code is under the branches folder labeled as 2.0.Experimental. Please give this version a try and let me know how it works for you. I'd like a lot of feedback on this feature, particularly from people that are using it currently.

Jun 8, 2010 at 7:06 AM

Works. Pretty nice. Of course there is much work to clean code. For example, HtmlAttribute will never have any children, any attributes and etc. I will try this version more.

 

Thank You for your job. HtmlAgilityPack is my favorite html parsing component.

Jun 8, 2010 at 7:09 AM

Which is why on any of those interface members that the attribute will not have I return the appropriate response. False on HasChildren and so forth. I'm sure there may be some edge cases I missed in my first attempt. If you have any examples I can put into the unit tests, I'd love to have them.

Dec 1, 2010 at 3:16 PM
Edited Dec 1, 2010 at 3:46 PM

I haven't tested yet, pulling it down now, but will it support xpath 2.0 things like matches() replace() or tokenize()? I think thats built in to the System.Xml.Xpath but I'm not sure...

Will I be able to select a node or attribute based on regex?

Edit: found this stackoverflow: http://stackoverflow.com/questions/1525299/xpath-and-xslt-2-0-for-net which says xpath 2.0 isn't supported, but the reference page here http://msdn.microsoft.com/en-us/library/system.xml.xpath.aspx says it supports XQuery 1.0 and XPath 2.0 Data Model, but i guess that excludes XPath 2.0 functions and operators.

 

Oct 8, 2012 at 10:53 AM

So what's up now with HtmlNodeType.Attribute? In last versions of HAP it's still not implemented, and when I select node's attribute the whole element node is selected((

Oct 31, 2012 at 6:00 PM

Hi,

Same question than RaTT:  is there plan to integrate that in the NuGet deliverable?

I have some generic code that take an arbitrary (.config) xpath expression, find the node and remove it.  I have to make some convoluted reg-ex and control flow in order to implement this right now...

 

Otherwise, quite a good lib!