Children nodes

Topics: Developer Forum
Jan 30, 2013 at 9:59 PM
Edited Jan 30, 2013 at 10:00 PM
I am analyzing the timetable of a bus company. I get the necessary data from its website. A node of a bus stop looks like this:
<tr data-stopcode="F01093">
    <td>5</td>
    <td ><span>Fifth street</span></td>
</tr>
I have this node in "hmln" and would like to get the name of the stop. I tried two method but only the first was working. Can you tell me what is wrong with the second one?
Working:
hmln.Elements("td").Last().FirstChild.InnerText
Not Working: -> "Object reference not set to an instance of an object."
hmln.LastChild.FirstChild.InnerText
Jan 30, 2013 at 10:26 PM
Hi Labu

here is a solution
AP.HtmlDocument hp = new AP.HtmlDocument();
hp.LoadHtml("<tr data-stopcode=\"F01093\">    <td>5</td>    <td ><span>Fifth street</span></td></tr>");

var nodes = hp.DocumentNode.SelectNodes("//tr[@data-stopcode]//span");
foreach (AP.HtmlNode node in nodes)
{
    string stopname = node.InnerText;
}
So I'm looking for all tr nodes with the attribute stopcode and looking for the span after and getting the text from the span.
hope this helps and you can see why it works

as for your sample the second one you are looking for the last child of the document which is now tr then the first child which is the spaces you have before the first td and your getting the text which is " "

hp.DocumentNode.LastChild.FirstChild.InnerText = " ";

Lee.
Jan 30, 2013 at 11:43 PM
Thanks for your answer.
I might not have made myself clear. -> " for your sample the second one you are looking for the last child of the document which is now tr then the first child which is the spaces you have before the first td and your getting the text which is " ""
HtmlDocument hmln is <tr data-stopcode="F01093"> not the parent of it. Your solution must be working, but having a distinct <tr data-stopcode="F01093"> node is better than having only one of its child because I need the first child too, which is the time of the travel.
Jan 30, 2013 at 11:52 PM
Ok..
What about
AP.HtmlDocument hp = new AP.HtmlDocument();
hp.LoadHtml("<tr data-stopcode=\"F01093\"><td>5</td><td ><span>Fifth street</span></td></tr>");

var nodes = hp.DocumentNode.SelectNodes("//tr[@data-stopcode]");
foreach (AP.HtmlNode node in nodes)
{
    string stopcode = node.Attributes["data-stopcode"].Value;
    string stopname = node.SelectSingleNode(".//span").InnerText;
    string stoptime = node.SelectSingleNode(".//td[position()=1]").InnerText;
}
any better?
Jan 31, 2013 at 10:08 AM
Thanks, it's elegant and works fine. But I still don't know why my second trial doesn't work. Could you have a look at that?
Jan 31, 2013 at 11:47 AM
Edited Jan 31, 2013 at 11:50 AM
Hi actually both seem to be working for me..
What version are you using? 1.4.6?
what .Net version?

tried version 1.4.0 and works as well..

all on one line no spaces.
test.xml = <tr data-stopcode="F01093"><td>5</td><td ><span>Fifth street</span></td></tr>
HtmlDocument hp = new HtmlDocument();
            
hp.Load(@"C:\Development\test.xml");
var tr = hp.DocumentNode.FirstChild;
// works
var txt = tr.Elements("td").Last().FirstChild.InnerText;
// works
var txtII = tr.LastChild.FirstChild.InnerText;
Whats your exact code and I'll take a peek..
Jan 31, 2013 at 12:55 PM
I am using HAP 1.4.6 with .Net 4.5