Beginner Help

Topics: Developer Forum, User Forum
Jan 16, 2013 at 5:54 AM

I’m trying to do something very simple with HtmlAgilityPack, I just don’t know anything about xml or nodes and I am having a lot of trouble pulling simple info from an HTML website.  Currently I am doing a VERY slow version using String manipulation of pure HTML, I could be parsing with xpaths or whatever, but I don't have a clue how to get started.… Can someone please help me with this?

Jan 25, 2013 at 6:50 AM

If you give a sample we might be able to help solve it and show you how.



Jan 28, 2013 at 2:00 AM

Here is an example url from which I would be trying to pull information such as prices and shipping prices for each item on the list, currently I have accomplished this using very SLOW String manipulation and searching loops.

Jan 28, 2013 at 10:22 AM
Edited Jan 28, 2013 at 1:15 PM

Here you go here is an example of how i would do it..



	    StreamReader sr = new StreamReader(@"C:\Development\sample.htm");
            string lines = sr.ReadToEnd();

            HtmlDocument hp = new HtmlDocument();

            var nodes = hp.DocumentNode.SelectNodes("//tbody[@class='result']");
            foreach (var item in nodes)
                var price = item.SelectSingleNode(".//span[@class='price']").InnerText;
                var shipping_block = item.SelectSingleNode(".//div[@class='shipping_block']");
                var price_shipping = shipping_block.SelectSingleNode(".//span[@class='price_shipping']").InnerText;
                var word_shipping = shipping_block.SelectSingleNode(".//span[@class='word_shipping']").InnerText;


Hope this points you in the right direction..

 what i looked for was the table which held the results. so i looked for tbody where the class attribute was result. then looked for any child of that that was a span with the class attribute price and got the txt from that. and did similar for the shippimg details.



Jan 30, 2013 at 11:59 PM
Thanks for the response and code Lee. I am getting an error right now though... "Object reference not set to an instance of an object." for this line:

var price_shipping = shipping_block.SelectSingleNode(".//span[@class='price_shipping']").InnerText; Do you know why this is happening?
Jan 31, 2013 at 12:10 AM
Its probably that there isn't a shipping price for the item. you would need to check that the div with class = 'shipping_block' exists

if (shipping_block != null)


Jan 31, 2013 at 12:30 AM
It's working now, now I have to look at the code and figure out exactly how it's working... Thanks so much!