This project has moved and is read-only. For the latest updates, please go here.
2
Vote

Incorrect xpath parse

description

Hi,

First, I would like to say that the tool is great! However, there is a webpage that HtmlAgilityPack is not finding the correct element when I give the xpath nor it is generating the correct xpath when I choose one element.

This is the webpage:

http://scholar.google.com/citations?user=d2f0VUQAAAAJ&hl=en

(the problem should happen with any google scholar citations example)

If I search for the xpath:

/html[1]/body[1]/div[2]/div[1]/div[1]/div[2]/form[1]/div[2]/div[1]/table[1]/tbody[1]/tr[2]/td[1]/a[1]

It returns null.

However, in every browser that I tried, it returns the element that I am looking for. For instance, this is the result when I use document.evaluate on chrome:

document.evaluate("/html[1]/body[1]/div[2]/div[1]/div[1]/div[2]/form[1]/div[2]/div[1]/table[1]/tbody[1]/tr[2]/td[1]/a[1]", document, null, 9, null).singleNodeValue

<a href=​"/​citations?view_op=view_citation&hl=en&user=d2f0VUQAAAAJ&citation_for_view=d2f0VUQAAAAJ:​u5HHmVD_uO8C" class=​"cit-dark-large-link">​The S LAM project: debugging system software via static analysis​</a>​

The problem also happens the other way around. When I get a xpath with HtmlAgilityPack, such as this one:

/html[1]/body[1]/div[2]/div[1]/div[1]/div[2]/div[7]/div[1]/table[1]/tbody[1]/tr[1]/td[1]/a[1]

It does not work when I try to find the element with the document.evaluate of the browser. I think the problem is related to the tag "form". HtmlAgilityPack does not include the children of a form. It says that they are siblings. I think it is the same problem mentioned here:

https://htmlagilitypack.codeplex.com/workitem/29782

Thank you,

Gustavo

comments