This project has moved and is read-only. For the latest updates, please go here.

Link extraction problem

Topics: Developer Forum, Project Management Forum, User Forum
Aug 22, 2013 at 8:26 AM
I have written a code in

The expected output of my program would be a list of extracted links that are inside the
<a href tag and has a word in common.

In my program i want to display all links that contains the word "test".

For example:
But my program is not displaying anything at all. Where did i go wrong?

Here is my code:
 Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim webClient As New System.Net.WebClient
        Dim WebSource As String = webClient.DownloadString("")
        RichTextBox1.Text = WebSource

        Dim links As New List(Of String)()
        Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()

        For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")

            If link.InnerText.Contains("test") Then
            End If


    End Sub
I am currently new to this HtmlAgilityPack, I am still learning please bear with me.
Sep 16, 2013 at 12:56 PM
Edited Sep 16, 2013 at 1:00 PM
I think this code could help you
However, we have to change the code a bit, and if you're a professional programmer, you can use this code.
Public Function GetLinkAddress(ByVal HtmlSource As String) As String
        Dim LinkExtract As String = Nothing
        Dim htmldoc As New HtmlAgilityPack.HtmlDocument()
        Dim qq = From t In htmldoc.DocumentNode.Descendants("a") Where t.Attributes.Contains("href") _
                                              Select t
        For i As Integer = 0 To qq.Count - 1
            For m As Integer = 0 To qq(i).Attributes.Count - 1
                If qq(i).Attributes(m).Name = "href" Then
                    LinkExtract = qq(i).Attributes(m).Value
                    Exit For
                End If
        Return LinkExtract
End Function