htmlnode.SelectSingleNode wired

Topics: Developer Forum
Mar 10, 2011 at 4:46 AM

Hello everyone, When I use foreach to get a lot of html note, then I go through all the note,and try to SelectSingleNode in each htmlnode. However, it seems doen' work. I still serach the whole document,not current html node value.

This is my previous code which has these wired problem.

foreach (HtmlNode item in doc.DocumentNode.SelectNodes("//div[@class=\"bookMain seeMoreItem\"]"))

            {

                KindleNote note = new KindleNote();

                note.Content= item.SelectSingleNode("//div[@class=\"singleHighlight\"]/span").InnerText;

                note.Tag = item.SelectSingleNode("//div[@class=\"note\"]").InnerText;

                string date = item.SelectSingleNode("//div[@class=\"sharedOn\"]").InnerText;


                string[] str = Regex.Replace(date.Trim('\n'), @"( |\t|\r?\n)\1+", "$1").Trim(' ').Split(' ');

                DateTime dt = ConvertToDateTime(str[2], str[3], str[4]);

                 note.SharedTime = dt;

                rawKindleList.Add(note);

            }

 

        }

So I change the code to this ,and it works fine.

foreach (HtmlNode item in doc.DocumentNode.SelectNodes("//div[@class=\"bookMain seeMoreItem\"]"))
            {

                KindleNote note = new KindleNote();
                HtmlDocument divDoc = new HtmlDocument();
                divDoc.LoadHtml(item.InnerHtml);

                note.Content = divDoc.DocumentNode.SelectSingleNode("//div[@class=\"singleHighlight\"]/span").InnerText;

                note.Tag = divDoc.DocumentNode.SelectSingleNode("//div[@class=\"note\"]").InnerText;

                string date = divDoc.DocumentNode.SelectSingleNode("//div[@class=\"sharedOn\"]").InnerText;

                string[] str = Regex.Replace(date.Trim('\n'), @"( |\t|\r?\n)\1+", "$1").Trim(' ').Split(' ');

                DateTime dt = ConvertToDateTime(str[2], str[3], str[4]);
                note.SharedTime = dt;
                rawKindleList.Add(note);
            }

 

I just wonder to know, why my first way doesn't work and can we improve this?

 

 

Mar 10, 2011 at 8:38 AM

maybe you can change note.Content= item.SelectSingleNode("//div[@class=\"singleHighlight\"]/span").InnerText;

to ./div[@class=\"singleHighlight\"]/span"

 

then try again !

 

Jul 15, 2011 at 7:48 AM

I have the same problem with billmoling.

When I change "//div" to "./div", it returns null.

Is there some way to solve the problem?

Jul 15, 2011 at 11:04 AM

This does look like an XPath issue: however, you shouldn't change //div to ./div but rather to .//div.  An XPath starting with a / means "search from document root", so the behavior is correct.  // means search all descendants; so what you want is .// which searches all descendants of the current node.  Equivalently, you could use "descendant::div" to make this explicit.

Jul 18, 2011 at 2:52 AM
emn13 wrote:

This does look like an XPath issue: however, you shouldn't change //div to ./div but rather to .//div.  An XPath starting with a / means "search from document root", so the behavior is correct.  // means search all descendants; so what you want is .// which searches all descendants of the current node.  Equivalently, you could use "descendant::div" to make this explicit.

It works, thanks!