Extracting a table from a page

Jul 10, 2012 at 10:27 AM
Edited Jul 10, 2012 at 11:38 AM

Hi all

Currently I am trying to extract a table from a page that contains multiple tables. The table that I am interested in has this html code

<table width="100%"  border="0" cellspacing="0" cellpadding="3" summary="Contains search results">

Now I want to know how I can get the node that has that summary ="Contains search results" attribute ?

Jul 11, 2012 at 4:13 PM

Here's one way to do it. I like to use Linq, but you can do it other ways. This shows loading the document from a file, but again, you can do it other ways. I didn't compile it, so apologies in advance if you need to tweak it a bit to get it right! But hopefully it'll give you a push in the right direction.


    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.Load( "foo.html" );

    if ( htmlDoc.DocumentNode != null ) {
        var tableNode = ( from node in htmlDoc.DocumentNode.Descendants()
                         where node.Name == "table"
                         where node.Attributes.Contains( "summary" )
                         where node.Attributes["summary"].Value == "Contains search results"
                         select node ).FirstOrDefault();
        if ( tableNode != null ) {
            // Work with table node here