Beginner Help

Topics: Developer Forum, User Forum
Jan 16, 2013 at 4:54 AM

I’m trying to do something very simple with HtmlAgilityPack, I just don’t know anything about xml or nodes and I am having a lot of trouble pulling simple info from an HTML website.  Currently I am doing a VERY slow version using String manipulation of pure HTML, I could be parsing with xpaths or whatever, but I don't have a clue how to get started.… Can someone please help me with this?

Jan 25, 2013 at 5:50 AM

If you give a sample we might be able to help solve it and show you how.

 

Lee

Jan 28, 2013 at 1:00 AM

Here is an example url from which I would be trying to pull information such as prices and shipping prices for each item on the list, currently I have accomplished this using very SLOW String manipulation and searching loops.

 

 

http://www.amazon.com/gp/offer-listing/4871877094/ref=dp_olp_used?ie=UTF8&condition=used

Jan 28, 2013 at 9:22 AM
Edited Jan 28, 2013 at 12:15 PM

Here you go here is an example of how i would do it..

 

 

	    StreamReader sr = new StreamReader(@"C:\Development\sample.htm");
            string lines = sr.ReadToEnd();

            HtmlDocument hp = new HtmlDocument();
            hp.LoadHtml(lines);

            var nodes = hp.DocumentNode.SelectNodes("//tbody[@class='result']");
            foreach (var item in nodes)
            {
                var price = item.SelectSingleNode(".//span[@class='price']").InnerText;
                var shipping_block = item.SelectSingleNode(".//div[@class='shipping_block']");
                var price_shipping = shipping_block.SelectSingleNode(".//span[@class='price_shipping']").InnerText;
                var word_shipping = shipping_block.SelectSingleNode(".//span[@class='word_shipping']").InnerText;
            }

 

Hope this points you in the right direction..

 what i looked for was the table which held the results. so i looked for tbody where the class attribute was result. then looked for any child of that that was a span with the class attribute price and got the txt from that. and did similar for the shippimg details.

 

lee

Jan 30, 2013 at 10:59 PM
Thanks for the response and code Lee. I am getting an error right now though... "Object reference not set to an instance of an object." for this line:

var price_shipping = shipping_block.SelectSingleNode(".//span[@class='price_shipping']").InnerText; Do you know why this is happening?
Jan 30, 2013 at 11:10 PM
Its probably that there isn't a shipping price for the item. you would need to check that the div with class = 'shipping_block' exists

if (shipping_block != null)
{

}

Lee
Jan 30, 2013 at 11:30 PM
It's working now, now I have to look at the code and figure out exactly how it's working... Thanks so much!