Extract/Parse basic html table without headers

Topics: Developer Forum, Project Management Forum, User Forum
Jun 29, 2012 at 8:40 PM
Edited Jul 4, 2012 at 4:41 PM

I have tried several examples here but its not leading me to the correct solution. Here's what I'm looking for :

customer.html

<html>
  <head>
    <title>TESTING</title>
  </head>
  <body>
    <table>
      <tr>
        <td>Phone</td>
        <td>Name</td>
        <td>City</td>
        <td>Address</td>
        <td>Postal</td>
      </tr>
      <tr>
        <td>905-222-2222</td>
        <td>Scott Tiger</td>
        <td>Toronto</td>
        <td>1 Yonge St.</td>
        <td>M5J 2J5</td>
      </tr>
      <tr>
        <td>416-222-222</td>
        <td>Bill Gates</td>
        <td>Toronto</td>
        <td>2 Church St.</td>
        <td>M8J 5J5</td>
      </tr>
    </table>
  </body>
</html>

 

This is the output I'm looking for:

PHONE NUMBER : 905-222-2222
NAME         : Scott Tiger
ADDRESS      : 1 Yonge St.










PHONE NUMBER : 416-222-2222
NAME         : Bill Gates
ADDRESS      : 2 Church St.

 

 

 

 

Here's what I've done so far :

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace Phonex
{
    class Program
    {
        static void Main(string[] args)
        {
            String htmlFile = "..\\..\\phlist.html";
            
            HtmlDocument doc = new HtmlDocument();
            doc.Load(htmlFile);

            HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
            HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");

            string makeSpace = "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";

            for (int i = 1; i < rows.Count; ++i)
            {
                HtmlNodeCollection cols = rows[i].SelectNodes(".//td");
                for (int j = 0; j < cols.Count; ++j)
                {
                    string value = cols[j].InnerText;
                    //char[] delimiter1 = new char[] { '\r' };
                    //string[] array1 = value.Split(delimiter1, StringSplitOptions.None);

                    Console.WriteLine("PHONE NUMBER : " + value[0] + "\n");
                    Console.WriteLine("NAME         : " + value[1] + "\n");
                    Console.WriteLine("ADDRESS      : " + value[3] + ", " + value[2] + "\n");
                    Console.WriteLine(makeSpace);

                    //Console.WriteLine(value);

                    //foreach (string entry in array1)
                    //{
                    //    Console.WriteLine("PHONE NUMBER : " + value[0] + "\n");
                    //    Console.WriteLine("NAME         : " + value[1] + "\n");
                    //    Console.WriteLine("ADDRESS      : " + value[2] + "\n");

                    //    Console.WriteLine(makeSpace);
                    //}
                    
                }
            }

        }
    }
}

 

What am i doing wrong? Also, is there a way to ignore the table headers?

Jul 4, 2012 at 5:41 PM
Edited Jul 5, 2012 at 3:36 PM

Figured it out! :)

 

Here's for the newbees like me :)

 

 

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace Phonex
{
    class Program
    {
        static void Main(string[] args)
        {
            String htmlFile = "..\\..\\phlist.html";
            
            HtmlDocument doc = new HtmlDocument();
            doc.Load(htmlFile);

            HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
            HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");

            string makeSpace = "\n\n\n\n\n";

            for (int i = 1; i < rows.Count; ++i)
            {
                HtmlNodeCollection cols = rows[i].SelectNodes(".//td");

                string phoneNumber = cols[0].InnerText;
                string name = cols[1].InnerText;
                string city = cols[2].InnerText;
                string address = cols[3].InnerText;

                Console.WriteLine("PHONE NUMBER : " + phoneNumber + "\n");
                Console.WriteLine("NAME         : " + name + "\n");
                Console.WriteLine("ADDRESS      : " + city + ", " + address + "\n");
                Console.WriteLine(makeSpace);

            }

        }
    }
}