HtmlNode.XPath Property

Topics: Developer Forum
Aug 7, 2007 at 9:19 PM
Hello,
I created an XPath property for the HtmlNode Class. This will return the absolute XPath for the current node. I did this as a replacement to my previous post http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=13171, since the XPath is more accurate.

Add to the HtmlNode Class:

internal string _xpath; //2007.08.07.1202.AFG

/// <summary>
/// Gets the XPath of the current HTML node.
/// </summary>
/// <revisions>
/// 2007.08.07.1233.AFG - Added XPath property to node
/// </revisions>
public string XPath
{
get
{
if (_xpath == null)
_xpath = BuildXPath(this);
return _xpath;
}
}

/// <summary>
/// Builds the XPath property for the node
/// </summary>
/// <param name="node">HtmlNode: The node to find the XPath.</param>
/// <returns>String: The current node's XPath.</returns>
/// <revisions>
/// 2007.08.07.1229.AFG - created
/// </revisions>
internal String BuildXPath(HtmlNode node)
{
string xPath = "";
HtmlNode parent = node;
do
{
if (parent.Name.IndexOf("#") < 0)
xPath = string.Concat("/", parent.Name, XPathNodePosition(parent), xPath);
parent = parent.ParentNode;
}
while (parent != null);
return xPath;
}

/// <summary>
/// Builds the XPath String for the Node position
/// </summary>
/// <param name="node">HtmlNode: The node to find the position.</param>
/// <returns>String: The node's position among it's siblings in XPath notation.</returns>
/// <revisions>
/// 2007.08.07.1225.AFG - created
/// </revisions>
internal string XPathNodePosition(HtmlNode node)
{
if (node.Name == "html" || node.Name == "head" || node.Name == "body")
return string.Empty;
else
return string.Concat("[", FindNodePosition(node).ToString(), "]");
}

/// <summary>
/// Get the nodes sibling position.
/// </summary>
/// <param name="node">HtmlNode: The node to find the position.</param>
/// <returns>Int: The node's position among it's siblings. The first node is counted as position 1.</returns>
/// <revisions>
/// 2007.08.07.1223.AFG - created
/// </revisions>
internal int FindNodePosition(HtmlNode node)
{
HtmlNode sibling = node;
int pos = 0;
do
{
if (sibling.Name == node.Name)
pos += 1;
sibling = sibling.PreviousSibling;
}
while (sibling != null);
return pos;
}
Aug 22, 2007 at 6:21 PM
Ran into some problems with the BuildXPathProcedure, so I altered the code a little to make it more accurate:

/// <summary>
/// Builds the XPath property for the node
/// </summary>
/// <param name="node">HtmlNode: The node to find the XPath.</param>
/// <returns>String: The current node's XPath.</returns>
/// <revisions>
/// 2007.08.07.1229.AFG - created
/// 2007.08.07.1411.AFG - Changed the innerloop condition for setting the Xpath
/// Search for the parent node by checking that it is an element
/// node type, to avoid newline and tab characters that are read as nodes.
/// </revisions>
internal String BuildXPath(HtmlNode node)
{
string xPath = "";
HtmlNode parent = node;
do
{
if (parent.NodeType == HtmlNodeType.Element) //2007.08.22.1411.AFG - changed from: if (parent.Name.IndexOf("#") < 0)
xPath = string.Concat("/", parent.Name, XPathNodePosition(parent), xPath);
parent = parent.ParentNode;
}
while (parent != null);
return xPath;
}
Nov 10, 2007 at 7:31 PM
Edited Nov 11, 2007 at 10:47 PM
Hmm... interesting. Very good work Eclipse! I've incorporated this into my copy of the Agility Pack since this is exactly something I needed to do (search for a node by inner text).

Question on use - would I simply iterate over the nodes and look at their XPath property to match my text? I'm a little new to XPath queries so I'm not entirely clear on how they can be used to search for a node's inner html.

Thanks!