Html Field Value Retrieval

Topics: User Forum
Aug 25, 2006 at 12:13 AM
Simon,

I hope that my sending you this question is not too much of an imposition, however I’m stumped and I don’t know where else to turn for help.


Unfortunately, all that I know about coding in C# and your HtmlAgilityPack is what I have been able to teach myself. As part of my business, I am attempting to automate the process of obtaining property values for specific properties of interest from the website “Zillow.Com”. To effectively complete this task, I am attempting to obtain a hidden field variable “<TR class= id=37254303>” from the entry screen on Zillow. This id pointer directs the browswer to the specific property on the subsequent screen, providing access to values/data. Unfortunately when I attempt to access the web site with the HtmlAgilityPack code below, the returning document does not include the appropriate “id” information.



May I impose upon you to steer me in the right direction? Thank you very much.



class ParseZillo

{

STAThread

static void Main(string[] args)

{

HtmlWeb hw = new HtmlWeb();

string url = @"http://www.zillow.com/search/Search.htm?addrstrthood=5foxtrail+court&citystatezip=20878&mode=search";

HtmlDocument doc = hw.Load(url);

doc.Save("homevalue.htm");



HtmlDocument doc2 = new HtmlDocument();

Doc2.Load(@"homevalue.htm");

HtmlNodeCollection nodes = doc2.DocumentNode.SelectNodes("//text()");



}

}

}





Sincerely,

Bill Hunter

Whunter31@comcast.net


Coordinator
Aug 25, 2006 at 7:58 AM
Hi Bill,

I am not sure of what you are trying to achieve here :-)

In general, if the result you get from the Html Agility Pack is different than what you get from a web browser, it can mean 3 things:

1) the server relies on the Referrer http header (sent by a browser, and not by the Html Agility Pack by default)

3) the server relies on some context sent by a browser, and not by the Html Agility Pack by default (like cookie, security stuff, ...)

3) the html was really badly formed

Anyway, I don't see a difference between what your code gets and what a browser/view source gets. So you may have to analyze exactly what's going on between the client and the server. Sometimes, web scraping is hard.

Hope this helps
Simon.
Aug 25, 2006 at 11:53 AM
Simon,

Thank you for your responding so quickly.

Their is a hidden value "zpid" ('<input type = "hidden"') on Zillow necessay to identify the specific property for obtaining property information on the subsequent screen. I was hoping that there was a method through the HtlmAgility Pack to see that "hidden value", as I would then be able to automate obtaining values for specific properties. This value is of course not visible through "view source", nor saving the web browser page and viewing the source; however it is visible with numerous tools (DOM Explorer, XMLSpy, etc.)

Is there a way to view this value through HtmlAgilityPack?

Thanks again,
Bill
Coordinator
Aug 26, 2006 at 12:12 PM
You could use XPATH for this. Well, you have to learn a bit of XPATH, but once you get used to it, it's easy.

For exemple, to get the value out of this:
<input name="whatever" value="blabla"/>

, you would do something like:

HtmlNode node = htmlDocument.SelectNode(".//input@name = 'whatever'");
Console.WriteLine(node.GetAttributeValue("value"))
Aug 26, 2006 at 1:29 PM
Simon,

Thank you. I appreciate your help very much.

Bill