Help parse this page...

Topics: User Forum
Oct 22, 2010 at 7:33 PM

Ok guys, Ive got a problem...

Heres my code:
[code]
string html = wc.DownloadString("http://www.bungie.net/Stats/Reach/Commendations.aspx?player=" + PlayerName + "#Campaign");
           
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
           
            foreach (HtmlNode cpNode in doc.DocumentNode.SelectNodes("//img[@id[contains(@id,'cpCommendations')]]"))
            {
                cp.Add(cpNode.Attributes["src"].Value);
                cpName.Add(cpNode.Attributes["title"].Value);
            }
[/code]

This goes to the bungie.net website, finds all of the Commendations with an ID that contains cpCommendations, and then adds them into a list. THe list is passed to a new form later and it populates with the list contents (image url's)

However, sometimes this code does not work! On occasion (actually quote often) it seems that cpNode is null. I cannot for the life of me figure this one out!

Anyone here give me a hand?

Oct 26, 2010 at 11:16 AM

I would modify your code to:

string html = wc.DownloadString("http://www.bungie.net/Stats/Reach/Commendations.aspx?player=" + PlayerName + "#Campaign");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode.SelectNodes("//img[@id[contains(@id,'cpCommendations')]]");
if (nodes == null)
{
  //do something
  throw new ArgumentException();
}

foreach (HtmlNode cpNode in nodes)
{
  cp.Add(cpNode.Attributes["src"].Value);
  cpName.Add(cpNode.Attributes["title"].Value);
}

Back to problem. I think problem is in xpath expression. Maybe in some situation, web page adds white space to id, or changes some letter to upper/lowerletter or without id at all. I would suggest you set break point near throw new exception and analyse doc source, play with xpath in Watch window of debugger and I'am sure that you will find problem pretty quick. Good luck.

Oct 31, 2010 at 6:49 PM

Thanks for the reply, sorry it took so long.

I manged to get it working by redownloading the webpage in a while loop and checking for the text I was going to get before I tried parsing with HAP.