This project has moved and is read-only. For the latest updates, please go here.

Beginner needs help

Topics: Developer Forum, User Forum
Feb 27, 2013 at 12:11 AM
Hello all ... I am trying to use HAP to pull player stats from this URL:

I would like to get all the available stats of all the players into a table of some sorts that I can then query. Can anyone give me a hand on how to scrape this data? Any help would be greatly appreciated. Thanks.
Feb 27, 2013 at 9:18 AM
Hi nbrege,

Here is something to get you started..
public string GetHtml(string url){
    WebRequest wr2 = WebRequest.Create(Url);
    wr2.Method = "GET";
    ((HttpWebRequest)wr2).UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17";
    HttpWebResponse wrr = (HttpWebResponse)wr2.GetResponse();
    Stream s = wrr.GetResponseStream();
    HTMLReader sr = new HTMLReader(s, wrr);
    string html = sr.ReadToEnd();
    return html;

public void Start(){

    string url = string.Format("");

    HtmlDocument hd = new AP.HtmlDocument();
    string test = GetHtml(url);
    HtmlNodeCollection nodes = hd.DocumentNode.SelectNodes("//tr[contains(@class,\"ysprow\")]");
    foreach (HtmlNode node in nodes)
        string name = node.SelectSingleNode("./td[position() = 1]/a").InnerText;
        string team = node.SelectSingleNode("./td[position() = 2]/a").InnerText;
        string gp = node.SelectSingleNode("./td[position() = 3]").InnerText;

this should work for you.

Feb 27, 2013 at 10:33 PM
Edited Feb 27, 2013 at 10:35 PM
Thanks for the reply. I'm getting this error now:

An unhandled exception of type 'System.Xml.XPath.XPathException' occurred in System.Xml.dll
Additional information: '//tr[contains(@class,\"ysprow\")]' has an invalid token.

Any ideas?
Feb 27, 2013 at 10:50 PM
No problem.. assume you fixed the errors issues in the code as didn't check before published. HTMLReader is a class I created :)

Can you check the version that you are using as I tested the code to make sure its working and I'm not getting an issue.

If you could put up the following line of code exactly as you have it.

HtmlNodeCollection nodes = hd.DocumentNode.SelectNodes("//tr[contains(@class,\"ysprow\")]");

you need to make sure your on version as versions have caused issues before.

Feb 27, 2013 at 11:19 PM
The version I'm using is If I change the Xpath to "//tr[contains(@class,""ysprow"")]" then it works. (removed the 2 "\" characters)
I will update to the latest version & try it again. Thanks for the help so far...
Feb 27, 2013 at 11:22 PM
Edited Feb 28, 2013 at 12:44 AM
the 2 "\" characters are used as escape chars for the string you might have the @sign at the beginning of the string. The version shouldn't matter in that case.
Feb 28, 2013 at 12:40 AM
The new version made no difference, still get the error when I include the 2 "\" characters previously noted.

How would I get the stats from these 2 rows (the 16 & the 30)?

<td class="yspscores"> </td><td class="yspscores">16</td>
<td class="yspscores"> </td><td class="ysptblclbg6"><span class="yspscores">30</span></td>
Feb 28, 2013 at 12:46 AM
string a = node.SelectSingleNode("./td[position() = 5]").InnerText;
string pts = node.SelectSingleNode("./td[position() = 6]/span").InnerText;

you are looking for the 6th td "td[position() = 6]" in the row and then the span inside that td "/span" and then to get the text inside the span..
Feb 28, 2013 at 1:18 AM
Edited Feb 28, 2013 at 1:37 AM
For some reason those 2 lines didn't paste in right. This is how they look in the page source code:

<td class="yspscores">& nbsp;</td><td class="yspscores">16</td>
<td class="yspscores">& nbsp;</td><td class="ysptblclbg6"><span class="yspscores">30</span></td>

There is a "& nbsp;" in there. (I have to type a space between the & and the nbsp for it to display on here).
Feb 28, 2013 at 1:30 AM
This is the final code that seems to work:
        Dim htmlWeb As New HtmlWeb()
        Dim document As HtmlAgilityPack.HtmlDocument = htmlWeb.Load(urlString)

        document.OptionFixNestedTags = True
        document.OptionOutputAsXml = True

        Dim nodes As HtmlNodeCollection = document.DocumentNode.SelectNodes("//tr[contains(@class,""ysprow"")]")

        For Each node As HtmlNode In nodes

            Dim name As String = node.SelectSingleNode("./td[position() = 1]/a").InnerText
            Dim team As String = node.SelectSingleNode("./td[position() = 2]/a").InnerText
            Dim gp As String = node.SelectSingleNode("./td[position() = 3]").InnerText
            Dim goals As String = node.SelectSingleNode("./td[position() = 5]").InnerText
            Dim assists As String = node.SelectSingleNode("./td[position() = 7]").InnerText
            Dim points As String = node.SelectSingleNode("./td[position() = 9]").InnerText
            Dim plusminus As String = node.SelectSingleNode("./td[position() = 11]").InnerText
            Dim pims As String = node.SelectSingleNode("./td[position() = 13]").InnerText
            Dim hits As String = node.SelectSingleNode("./td[position() = 15]").InnerText

            Me.DataGridView1.Rows.Add(name, team, gp, goals, assists, points, plusminus, pims, hits)

Thanks again for your help...
Feb 28, 2013 at 1:35 AM
PS ... why doesn't the bold & italic formatting work on here?
Feb 28, 2013 at 1:37 AM
No problem.. I see why the 2 "\" chars didn't work.. My sample was in C# and your using VB.Net :)

best of luck with it..