Link extraction problem

Topics: Developer Forum, Project Management Forum, User Forum
Aug 22, 2013 at 7:26 AM
I have written a code in VB.net.

The expected output of my program would be a list of extracted links that are inside the
<a href tag and has a word in common.

In my program i want to display all links that contains the word "test".

For example:
www.drivetest.ca/
www.drivetest.ca/EN/bookatest/Pages/Road-Test-Booking.aspx
www.drivetest.ca/EN/drivereducation/Pages/Driver-Testing.aspx
www.cic.gc.ca/english/citizenship/cit-test.asp
But my program is not displaying anything at all. Where did i go wrong?

Here is my code:
 Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim webClient As New System.Net.WebClient
        Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1e63a873f2e9c884&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA")
        RichTextBox1.Text = WebSource

        Dim links As New List(Of String)()
        Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()
        htmlDoc.LoadHtml(WebSource)

        For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")

            If link.InnerText.Contains("test") Then
                ListBox1.Items.Add(link.InnerText)
            End If

        Next


    End Sub
I am currently new to this HtmlAgilityPack, I am still learning please bear with me.
Sep 16, 2013 at 11:56 AM
Edited Sep 16, 2013 at 12:00 PM
I think this code could help you
However, we have to change the code a bit, and if you're a professional programmer, you can use this code.
Public Function GetLinkAddress(ByVal HtmlSource As String) As String
        Dim LinkExtract As String = Nothing
        Dim htmldoc As New HtmlAgilityPack.HtmlDocument()
        htmldoc.LoadHtml(HtmlSource)
        Dim qq = From t In htmldoc.DocumentNode.Descendants("a") Where t.Attributes.Contains("href") _
                                              Select t
        For i As Integer = 0 To qq.Count - 1
            For m As Integer = 0 To qq(i).Attributes.Count - 1
                If qq(i).Attributes(m).Name = "href" Then
                    LinkExtract = qq(i).Attributes(m).Value
                    Exit For
                End If
            Next
        Next
        Return LinkExtract
End Function