Help Locating This Table Please

Topics: User Forum
Sep 13, 2012 at 8:47 PM

Hi

Please can someone show me where I'm going wrong, I cant get the path correct to get the table data. I'm looking to get table rows and data from sTable. sTable actually appears twice on the page. I do need the data from both but I dont have a problem running 2 queries if its easier.

The page html looks like this

<div class="col6">		
<table class="sTable">
<caption>Information</caption>
<tbody>
<tr>
<th scope="row">Type</th>
<td>Broken</td>
</tr>
</tbody>
<tbody>
<tr>
<th scope="row">Make</th>
<td>Fors</td>
</tr>
</tbody>
</table>

<table class="sTable">
<caption>Functions</caption>
<tbody>
<tr>
<th scope="row">Number of Functions</th>
<td>16</td>
</tr>
</tbody>
<tbody>
<tr>
<th scope="row">Timer</th>
<td>198 minutes</td>
</tr>
</tbody>
</table>
</div>

Here is my code so far, although it does not work

string stockPath = "//div[@class=\'col6\']";                     
HtmlNodeCollection nct = hd.DocumentNode.SelectNodes(stockPath);
HtmlNodeCollection tbs = nct[0].SelectNodes(".//table");
foreach (HtmlNode hn in tbs[0].ChildNodes)
                     {
                            if (hn.Name == "tbody")
                            {
        HtmlNodeCollection rows = hn.SelectNodes(".//tr");
                               
                                for (int i = 0; i < rows.Count; ++i)
                                {
        HtmlNodeCollection cols = rows[i].SelectNodes(".//td");
                                                                   
                                    string value = cols[0].InnerText;
                    for (int l = 0; l < tempArray.Length / 2; l++)
                                    {
                     if (tempArray[l, 0].Replace("_", " ") == value)
                                        {
                               tempArray[l, 1] = cols[1].InnerText;
                                        }
                                    }
                                }

                            }
                        }
                        specstr = GetArraySpec(tempArray);
                    }
                }

not sure if this helps or not but on another discussion, I read about how to get the path using firebug and this is what I ended up with

/html/body/div/div/section/div[5]/section[2]/div/div[2]/table

Really appreciated anyones time and advice on this as its been doing my head in for hours :(

 

 

 

Sep 16, 2012 at 5:21 PM

Here is what I have:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.IO;

namespace HTMLParseExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string[,] tempArray = { 
                                  { "Type", "" }, 
                                  { "Make", "" }, 
                                  { "Number_of_Functions", "" }, 
                                  { "Timer", "" } 
                                  };
            HtmlDocument hd = new HtmlDocument();

            string htmlPath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) + "\\Fubar.html";

            if (File.Exists(htmlPath)) Console.WriteLine("Found file 'Fubar.html'");
            else
            {
                Console.WriteLine("Unable to find file 'Fubar.html'");
                Console.WriteLine();
                Console.WriteLine("Pausing: ");
                Console.ReadLine();
                Environment.Exit(1);
            }

            hd.Load(htmlPath);

            string stockPath = "//div[@class='col6']";
            HtmlNode nct = hd.DocumentNode.SelectNodes(stockPath)[0];
            HtmlNodeCollection Tables = nct.SelectNodes("table[@class='sTable']");

            foreach (HtmlNode table in Tables)
            {
                HtmlNodeCollection Rows = table.SelectNodes("tbody/tr");

                foreach (HtmlNode row in Rows)
                {

                    HtmlNodeCollection TDs = row.SelectNodes("th|td");

                    for (int iTA = 0; iTA < tempArray.Length / 2; iTA++)
                    {
                        if (tempArray[iTA, 0].Replace("_", " ") == TDs[0].InnerText)
                            tempArray[iTA, 1] = TDs[1].InnerText;

                    }
                }
            }

            for (int l = 0; l < tempArray.Length / 2; l++)
            {
                Console.WriteLine("{0} - {1}", tempArray[l, 0], tempArray[l, 1]);
            }
            Console.WriteLine();
            Console.Write("Pausing: ");
            Console.ReadLine();
        }
    }
}


And It works with this mockup:


<html>
    <header>
        <title>Fubar's Pavilion</title>
    </header>
    <body>
        <div class="col5">
            <table>
                <caption>Fubar</caption>
                <tbody>
                    <tr>
                        <th>FooBar1</th>
                    </tr>
                </tbody>
            </table>
        </div>
        <div class="col6">
            <table class="sTable">
                <caption>Information</caption>
                <tbody>
                    <tr>
                        <th scope="row">Type</th>
                        <td>Broken</td>
                    </tr>
                </tbody>

                <tbody>
                    <tr>
                        <th scope="row">Make</th>
                        <td>Fors</td>
                    </tr>
                </tbody>
            </table>

            <table class="sTable">
                <caption>Functions</caption>
                <tbody>
                    <tr>
                        <th scope="row">Number of Functions</th>
                        <td>16</td>
                    </tr>
                </tbody>

                <tbody>
                    <tr>
                        <th scope="row">Timer</th>
                        <td>198 minutes</td>
                    </tr>
                </tbody>
            </table>
        </div>
    </body>
</html>


And you should get an output of:


Found file 'Fubar.html'
Type - Broken
Make - Fors
Number_of_Functions - 16
Timer - 198 minutes

Pausing:


This will work with any number of tables as long as they have attributes of class="sTable".   Hope this helps. God Bless,

Macster

Sep 17, 2012 at 4:31 PM

Hi Macster

This is totally awesome.. many thanks. I've just got back been away for the weekend. Will test the above code tonight, you really should see the smile on my face, this twisted my brain for ages :)

thanks again.. let you know how I get on

 

Jason