Return contents between two TR tags using HTMLAGILITYPACK

May 11, 2015 at 4:24 PM
I have been trying to scrape some data off a website. The source has differentiated all the headers of tables to that of the actual contents by different class names. Because I want to scrape all the table information, I got all the headers into one array and contents into another array. But the problem is that when I am trying to write the array contents into a file, I can write a header but second array contains contents from all the table and I cannot mark where contents of first table ends. Because htmlagilitypack scrapes all the tags of specified Nodes, I get all the contents. First let me show the code to make it clear:
<tr class=tableHeader>
<th width=16%>Caught</th>
<th width=16%><p><a href="/url">Normal Range</a></p></th>
</tr>
<TR class=content><TD><a href="/url"><i>Bluegill</i></a></TD>
<TD>trap net</TD>
<TD align=CENTER>4.05</TD>
<TD align=CENTER>    7.9 -    37.7</TD>
<TD align=CENTER>0.26</TD>
<TD align=CENTER>    0.1 -     0.2</TD>
</TR>
<TR class=content><TD><i></i></TD>
<TD>Gill net</TD>
<TD align=CENTER>1.50</TD>
<TD align=CENTER>N/A</TD>
<TD align=CENTER>0.07</TD>
<TD align=CENTER>N/A</TD>
</TR>
<tr class=tableHeader>
<th>0-5</th>
<th>6-8</th>
<th>9-11</th>
<th>12-14</th>
<th>15-19</th>
<th>20-24</th>
<th>25-29</th>
<th>30+</th>
<th>Total</th>
</tr>
<TR class=content><TD><i>bluegill</i></TD>
<TD align=CENTER>19</TD>
<TD align=CENTER>65</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>84</TD>
</TR>

Below is my code to save the headers and contents into array and try to display it exactly like in the website.


Below is my code to save the headers and contents into array and try to display it exactly like in the website.

int count =0;
foreach (var trTag4Pale in trTags4Pale)
{
    string trText4Pale = trTag4Pale.InnerText;
    paleLake[count] = trText4Pale;
    if (trTags4Small != null)
    {
        int counter = 0;
        foreach (var trTag4Small in trTags4Small)
        {
            string trText4Small = trTag4Small.InnerText;
            smallText[counter] = trText4Small;
            counter++;
        }
     }
     File.AppendAllText(path,paleLake[count]+Environment.Newline+smallText[count]+Environment.Newline);
}
As you see, When I try to append the contents of the array to a file, it lines in the first header, and contents of all the table. But I only want contents of the first table and would repeat the process to get the content of the second table and so forth. If I could get the contents between tr tag tableHeader, the arrays for the content would contain every contents for all the tables in different arrays. I don't know how to do this.