HtmlNodeNavigator / SelectNodes not respecting context (solved)

Topics: Developer Forum, User Forum
Jun 28, 2007 at 10:15 AM
using System.IO;
using HtmlAgilityPack;

class MainClass {
public static void Main(string[] args) {
HtmlDocument doc = new HtmlDocument();
StringReader st = new StringReader( "<html><select id=\"s1\"><option value=\"1\">1</option><option value=\"2\">2</option></select><select id=\"s2\"><option value=\"3\">3</option><option value=\"4\">4</option></select></html>");
doc.Load( st );
HtmlNodeCollection options = doc.DocumentNode.SelectNodes("//option[@value]");
System.Console.WriteLine( "Listing options for document" );
foreach (HtmlNode option in options) {
System.Console.WriteLine( "option value {0}", option.Attributes["value"].Value );
}
HtmlNode node = doc.DocumentNode.SelectSingleNode("//select[@id=\"s1\"]");
options = node.SelectNodes("//option[@value]");
System.Console.WriteLine( "Listing options for {0}", node.Attributes["id"].Value );
foreach (HtmlNode option in options) {
System.Console.WriteLine( "option value {0}", option.Attributes["value"].Value );
}
node = doc.DocumentNode.SelectSingleNode("//select[@id=\"s2\"]");
options = node.SelectNodes("//option[@value]");
System.Console.WriteLine( "Listing options for {0}", node.Attributes["id"].Value );
foreach (HtmlNode option in options) {
System.Console.WriteLine( "option value {0}", option.Attributes["value"].Value );
}
}
}

When using the preceding code, the result is :

Listing options for document
option value 1
option value 2
option value 3
option value 4
Listing options for s1
option value 1
option value 2
option value 3
option value 4
Listing options for s2
option value 1
option value 2
option value 3
option value 4

Instead, it should be :

Listing options for document
option value 1
option value 2
option value 3
option value 4
Listing options for s1
option value 1
option value 2
Listing options for s2
option value 3
option value 4

The main reason for this problem seems that, when calling HtmlNodeNavigator(HtmlDocument doc, HtmlNode currentNode), something (I think) in the Base class XPathNavigator calls MoveToRoot.
And MoveToRoot goes to the root of the document. So we loose the "currentNode".

I modified HtmlNodeNavigator.cs the following way :

@@ -16,6 +16,7 @@
{
private HtmlDocument _doc = new HtmlDocument();
private HtmlNode _currentnode;
+ private HtmlNode _originalposition = null;
private int _attindex;
private HtmlNameTable _nametable = new HtmlNameTable();

@@ -30,6 +31,9 @@
{
InternalTrace(null);
_currentnode = _doc.DocumentNode;
+ if( _originalposition != null ) {
+ _currentnode = _originalposition;
+ }
_attindex = -1;
}

@@ -88,6 +92,7 @@

internal HtmlNodeNavigator(HtmlDocument doc, HtmlNode currentNode)
{
+ _originalposition = currentNode;
if (currentNode == null)
{
throw new ArgumentNullException("currentNode");
@@ -115,6 +120,7 @@
_currentnode = nav._currentnode;
_attindex = nav._attindex;
_nametable = nav._nametable; // REVIEW: should we do this?
+ _originalposition = nav._originalposition;
}

/// <summary>
@@ -542,7 +548,11 @@
/// </summary>
public override void MoveToRoot()
{
+ if( _originalposition == null ) {
_currentnode = _doc.DocumentNode;
+ } else {
+ _currentnode = _originalposition;
+ }
InternalTrace(null);
}


And it indeed corrects the problem. However I've not tested any side effect.

Can other people have a look to this patch ?

Thanks,

Ludovic.
Jul 17, 2007 at 10:16 PM
I'd say that there is no bug here, only a flawed Xpath query. Any Xpath query beginning with "/" (and "//" by extension) will select all matching descendants of the document node, regardless of context. See http://www.w3.org/TR/xpath#path-abbrev.

In order to get your desired output in your examples, instead of doing
node.SelectNodes("//option[@value]")
you should do one of the following (currently working) alternatives:
node.SelectNodes("option[@value]")
- or -
node.SelectNodes(".//option[@value]")
Jul 18, 2007 at 9:13 AM
Xenolinguist,

You're perfectly right. My query was wrong. So the patch was not the right way to go.

Thank you for your help !

Ludovic