Scraping Link - can this be done better?

Jan 10, 2008 at 6:26 PM
Do I have to use two foreach statements here or would this be possible with one foreach statement, too? What I want to do is to extract the plain URLs:

HtmlDocument hd = csFN.Scrape.GetHtmlAsDoc("http://www.google.de/search?q=html&ie=utf-8&oe=utf-8&aq=t");
HtmlNodeCollection nct = hd.DocumentNode.SelectNodes("//h2@class=\"r\"/a/@href");

foreach (HtmlNode hn in nct)
{
foreach (HtmlAttribute atr in hn.Attributes)
{
if (atr.Name == "href")
{
Response.Write(atr.Value + "<br />");
}
}
}

Thanks for any input!
Jan 26, 2008 at 12:06 AM
break out of your nested loop when you find a match so it doesnt have to check the rest of the attribute list.
Feb 7, 2008 at 11:48 AM
Yes.

...

foreach (HtmlNode hn in nct)
{
HtmlAttribute atr = hn.Attributes"href";
Response.Write(atr.Value + "<br />");
}
Mar 4, 2009 at 8:51 AM
Edited Mar 4, 2009 at 9:19 AM
Dear All,

Tried the same code
but gives me an error.

"//h2@class=\"r\"/a/@href" has invalid token.

FULL CODE:

        HtmlDocument hd = new HtmlDocument();
        hd.LoadHtml(Server.MapPath("something.htm"));
        HtmlNodeCollection nct = hd.DocumentNode.SelectNodes("//h3@class=\"r\"/a/@href");
      
something.htm has

<html>
<body><h2 class="r"><a href="something.com/somethingelse.htm"></h2></body>
</html>

Can someone help please?

Prasad.