I cannot get value from some website with Html Agility

Topics: Developer Forum, Project Management Forum, User Forum
Apr 14, 2010 at 11:05 AM

This web is Taiwan Stock Exchange,( http://bsr.twse.com.tw/bshtm/bshtm_report_Messages.aspx?strDate=20100413&StartNumber=2475&FocusIndex=1)

There are no problems with my usage of Html Agility Pack on other websites.

 However, I met some difficulties on this one.

 I couldn't get Value from it.

The error  is Null of Reference Exception.

Here is my code:

Public Sub Main()

        Dim client As New WebClient()
        Dim ms As New MemoryStream(client.DownloadData("http://bsr.twse.com.tw/bshtm/bshtm_report_Messages.aspx?strDate=20100413&StartNumber=2475&FocusIndex=1"))

        Dim doc As New HtmlDocument()
        doc.Load(ms, Encoding.UTF8)
        Dim docStockContext As New HtmlDocument()

        Dim values As String() = docStockContext.DocumentNode.SelectSingleNode("./tbody/tr[2]/td[2]").InnerText.Trim().Split(ControlChars.Lf)

        My.Response.Write(values(0).Trim() & "<br/>")
        doc = Nothing
        docStockContext = Nothing
        client = Nothing

End Sub
Please help me to solve my problems,THX!
May 3, 2010 at 12:25 PM


Hello. I'll try to give you some advice on how to solve that based on my own experience crawling websites.

In a first moment, I tried to do it just like you, and started having much trouble with the null references. I'm not with VS now, so syntax can be a bit wrong, but I think it will be enough for you to start working on.

You didn't mention which is the element throwing the exception, but I assume it's the values one. Do the following:

HtmlNodeCollection hnc = docStockContext.DocumentNode.SelectNodes("//tbody/tr/td")

Then, in debug, check the structure that hnc has, and use this to find the values you want. And, of course, before it:

if (hnc != null) // C#, sorry... long time no vb :-)

Regards, hope this helps, let me know if anything is not clear.


May 4, 2010 at 12:29 PM
Edited May 4, 2010 at 12:53 PM

I'm having a similar problem i.e. that the SelectNodes doesn't work with MemoryStreams. It does seem to work with files, though. Instead of DownloadData() use DownloadFile() and rewrite your code accordingly.