Noobie Question

Topics: Developer Forum, User Forum
Dec 17, 2010 at 8:59 PM
Edited Dec 17, 2010 at 8:59 PM

Sorry guys for being such a noob, but I have never done web scraping before.

My quesiton is how can I scrap all href attributes that have a parent div container with a class attrib of 'persona-name'?

Here is the site im trying to scrap from:

 http://www.battlefieldheroes.com/en/player/2305114994

All I want to gather are the links colored in green:

<ul class="user-personas">
											<li class="user-persona clearfix">
							<img src="http://cdn.battlefieldheroes.com/static/20101215090912/bulk-images/hero-headshot-icons-32/1-6-2-0-130.png" alt="avatar" class="persona-avatar" />
							<div class="persona-name"><a href="/en/heroes/276720132">[gg]ROAST~BEEF</a></div>
							<div class="persona-faction faction-National" title="National">&nbsp;</div>
							<div class="persona-class class-gunner" title="Gunner">&nbsp;</div>
							<div class="persona-level level-16" title="Level 16">&nbsp;</div>
						</li>
											<li class="user-persona clearfix">
							<img src="http://cdn.battlefieldheroes.com/static/20101215090912/bulk-images/hero-headshot-icons-32/2-4-2-85-107.png" alt="avatar" class="persona-avatar" />
							<div class="persona-name"><a href="/en/heroes/235328126">[gg]SLOPPY~JOE</a></div>
							<div class="persona-faction faction-Royal" title="Royal">&nbsp;</div>
							<div class="persona-class class-commando" title="Commando">&nbsp;</div>
							<div class="persona-level level-12" title="Level 12">&nbsp;</div>
						</li>
											<li class="user-persona clearfix">
							<img src="http://cdn.battlefieldheroes.com/static/20101215090912/bulk-images/hero-headshot-icons-32/2-4-4-0-107.png" alt="avatar" class="persona-avatar" />
							<div class="persona-name"><a href="/en/heroes/233563772">[gg]HOOF~ARTED</a></div>
							<div class="persona-faction faction-Royal" title="Royal">&nbsp;</div>
							<div class="persona-class class-soldier" title="Soldier">&nbsp;</div>
							<div class="persona-level level-30" title="Level 30">&nbsp;</div>
						</li>
											<li class="user-persona clearfix">
							<img src="http://cdn.battlefieldheroes.com/static/20101215090912/bulk-images/hero-headshot-icons-32/2-6-2-85-107.png" alt="avatar" class="persona-avatar" />
							<div class="persona-name"><a href="/en/heroes/220683351">[gg]PORK~CHOP</a></div>
							<div class="persona-faction faction-Royal" title="Royal">&nbsp;</div>
							<div class="persona-class class-gunner" title="Gunner">&nbsp;</div>
							<div class="persona-level level-27" title="Level 27">&nbsp;</div>
						</li>
									</ul>
I have tried the following code and cannot figure out why it is not working:
        Dim content As String = ""
        Dim web As New HtmlAgilityPack.HtmlWeb
        Dim doc As HtmlAgilityPack.HtmlDocument = web.Load("http://www.battlefieldheroes.com/en/player/2305114994")
        Dim hnc As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//div[@class='persona-name']//a")
        For Each link As HtmlAgilityPack.HtmlNode In hnc
            Dim replaceUnwanted As String = ""

            replaceUnwanted = link.GetAttributeValue("href", String.Empty) '
            replaceUnwanted = replaceUnwanted.Replace("&#39;", "'")

            content &= replaceUnwanted & vbNewLine
        Next

        HTMLText.Text = content
I get the following error:

Object reference not set to an instance of an object.

...

Line 8:          Dim doc As HtmlAgilityPack.HtmlDocument = web.Load("http://www.battlefieldheroes.com/en/player/2305114994")
Line 9:          Dim hnc As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//div[@class='persona-name']")
Line 10:         For Each link As HtmlAgilityPack.HtmlNode In hnc
Line 11:             Dim replaceUnwanted As String = ""
Line 12: 

 

Thanks for any suggestions!

Dec 20, 2010 at 10:01 AM

I don't know why, but I see different html and there is no div[@class='persona-name']

"//div[@class='heroitem']/h3/a" is hero name.

<div class="heroitem">
	<h3><a href="/en/heroes/276720132">[gg]ROAST~BEEF</a></h3>
	<div class="heroinfo">
        <img class="heroavatar" alt="avatar" src="http://cdn.battlefieldheroes.com/static/20101215090912/bulk-images/hero-headshot-icons/1-6-2-0-130.png">
		<dl class="factclasslevel">
			<dt class="national">&nbsp;</dt>
			<dd class="faction">National</dd>
			<dt class="gunner">&nbsp;</dt>
			<dd class="heroclass">Gunner</dd>
			<dt class="level-16">&nbsp;</dt>
			<dd class="level">Level</dd>
		</dl>
	</div>
</div>