This project has moved. For the latest updates, please go here.

Scraping Link - can this be done better?

Jan 10, 2008 at 5:26 PM
Do I have to use two foreach statements here or would this be possible with one foreach statement, too? What I want to do is to extract the plain URLs:

HtmlDocument hd = csFN.Scrape.GetHtmlAsDoc("");
HtmlNodeCollection nct = hd.DocumentNode.SelectNodes("//h2@class=\"r\"/a/@href");

foreach (HtmlNode hn in nct)
foreach (HtmlAttribute atr in hn.Attributes)
if (atr.Name == "href")
Response.Write(atr.Value + "<br />");

Thanks for any input!
Jan 25, 2008 at 11:06 PM
break out of your nested loop when you find a match so it doesnt have to check the rest of the attribute list.
Feb 7, 2008 at 10:48 AM


foreach (HtmlNode hn in nct)
HtmlAttribute atr = hn.Attributes"href";
Response.Write(atr.Value + "<br />");
Mar 4, 2009 at 7:51 AM
Edited Mar 4, 2009 at 8:19 AM
Dear All,

Tried the same code
but gives me an error.

"//h2@class=\"r\"/a/@href" has invalid token.


        HtmlDocument hd = new HtmlDocument();
        HtmlNodeCollection nct = hd.DocumentNode.SelectNodes("//h3@class=\"r\"/a/@href");
something.htm has

<body><h2 class="r"><a href=""></h2></body>

Can someone help please?