This project has moved and is read-only. For the latest updates, please go here.

Problem selecting within a node

Topics: Developer Forum, Project Management Forum, User Forum
Sep 29, 2010 at 4:27 PM


I am parsing some html files that I need to extract data from to populate a SQL database and I am having a problem using an xpath statement against nodes that I am iterating through via a loop. 

The code below shows how I am walking through the nodes in the document I am parsing.  As I get each node I test it for a text string, in this case "Lines of Business", if I find that string, I then get the next sibling of which I want to then parse through the nodes within that node that have the class attribute named 'stdtext'.

It all looks fine, but when I run it, the xpath statement should retrieve all the nodes within the current node with the class attribute named 'stdtext', but instead it is getting ALL the 'stdtext' nodes from the entire document and not the current node.

Can anybody help?

// Iterate all rows in the first table
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//td[@class='td_title']"))
 switch (link.InnerHtml)
  case "Lines of Business":
   oNodeContent = link.NextSibling;
   foreach (HtmlNode slobs in nodNodeContent.SelectNodes("//td[@class='stdtext']"))
    this.htMultiSelect["Slobs"] += slobs.InnerHtml.Trim().Replace(", ", "|") + "|";
}                      }

Oct 7, 2010 at 10:56 AM


I got the same problem, have a look to this more "readable" example (no offense Steve :)).

I want to retrieve all the input elements of a given html form name :


 public partial class Form1 : Form
        public Form1()

        public void GetInputTextByFormName(string formName)
            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(@"");
            HtmlAgilityPack.HtmlNode formNode = doc.DocumentNode.SelectSingleNode(String.Format(@"//form[@name='{0}']", formName));
            HtmlAgilityPack.HtmlNodeCollection inputNodes = formNode.SelectNodes("//input");

            foreach (HtmlAgilityPack.HtmlNode node in inputNodes)
                if (node.Attributes["name"].Value == "user_username" || node.Attributes["name"].Value == "user_userpassword")
                    MessageBox.Show("Error or bug ?! Node is in the form 'connexion');



Instead of getting only the inputs from "form_account", i got ALL the inputs from the entire document. Do I miss something  ?


Oct 11, 2010 at 6:18 AM

In Html specification form tag can overlap, so Htmlagilitypack handle this node a little different.  So what you can do:


Mar 29, 2012 at 12:50 PM


I am having the same issue and that answer doesn't have enough information

Apr 6, 2012 at 1:21 PM
Edited Apr 6, 2012 at 1:22 PM

I believe I have a similar problem, as demonstrated thus:


 <HEAD><TITLE> HTML Agility Bug Demo</TITLE></HEAD>
		<tr><td>first row</td></tr>
		<tr><td>second row</td></tr>
		<tr><td>third row</td></tr>

HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
HtmlNodeCollection rowNodes = doc.DocumentNode.SelectNodes("//table/tr");
foreach(HtmlNode row in rowNodes)
	string test1 = row.InnerText;
	string test2 = row.SelectSingleNode("//td").InnerText; // This ALWAYS returns "first row" !!!



test1 works, it is first "first row" then "second row" then "third row".

test2 is always "first row". The scope of the row object SelectSingleNode() should be isolated to its current location and below only, but it seems that it selects from the root above.

Apr 13, 2012 at 10:21 AM

In the last persons example, just change the line to read:

string test2 = row.SelectSingleNode(".//td").InnerText;

The . means to do the search from the current node, // on its own means from the start of the document