pharsing HTML modified by java

Topics: Developer Forum, Project Management Forum, User Forum
May 4, 2011 at 7:30 PM

Hi, I need to pharse html from www.allegro.pl But there is a problem as I shouldn't read source, because it is probably transformed by java and I need to read html what  it "look rigth now" just like Firebug shows in browser.

For example I use site: http://allegro.pl/show_user.php?search=MODULO2 where I'm interresing to get this:


anusiak73 (403) (Kupujący) Pozytywny pon 25 kwi 2011 10:27:52 CEST
1047801221
Pokaż
Wszystko szybko i sprawnie, towar zgodny z opisem. POLECAM!

 

source of that part looks that:

<td class="list-color center_txt" style="text-align: right">
        <span style="float: left;">
            <td class="list-color center_txt" style="text-align: right">
                <span style="float: left;">
                    <span class="uname">
                        <a href="http://allegro.pl/show_user.php?uid=10461546" >anusiak73
                        </a>
                    </span> (403)
                        <a href="javascript:OpenHelp(41)">
                            <img src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif" border="0" width="17" height="17" />
                            <img src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif" border="0" width="17" height="17" />
                            <img src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif" border="0" width="17" height="17" />
                        </a> (Kupujący)
                    </span>
                        <td class="list-color center_txt">
                            <span class="pos">Pozytywny
                            </span>
        </span>
        <td class="list-color center_txt">pon 25 kwi 2011 10:27:52 CEST
        </span>                           <!-- here td should be closed -->                     
        <td class="list-color center_txt">
            <div title="Aukcja jest już w archiwum i nie ma możliwości jej obejrzenia">1047801221
            </div>
            </span>
            <td class="list-color center_txt">
                <div id="info_10461546_1047801221" onclick="show_feedback('10461546', '1047801221', '16112332', 'fb_all');" style="cursor: pointer; text-decoration: underline;">Pokaż
                </div>
            </td>
                <tr>
                    <td colspan="6" class="list-color center_txt">
                        <div class="toleft">Wszystko szybko i sprawnie, towar zgodny z opisem. POLECAM!
                            <tr>
                                <td colspan="6" class="list-color center_txt" style="padding: 0px; text-align:left">
                                    <div id="fe_description_10461546_1047801221" style="display: none;  margin: 2px 10px ; padding-left: 5px; color: #666; border-left: 2px solid #aaa;">
                                    </div>
                                </td>
                            </tr>
                </tr>

 

html view from Firebug for that same part:

<td style="text-align: right" class="list-color center_txt">
        <span style="float: left;">
        </span>
    </td>
    <td style="text-align: right" class="list-color center_txt">
        <span style="float: left;">
            <span class="uname">
                <a href="http://allegro.pl/show_user.php?uid=10461546">
                anusiak73
                </a>
            </span>
            (403)
            <a href="javascript:OpenHelp(41)">
                <img height="17" border="0" width="17" src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif">
                <img height="17" border="0" width="17" src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif">
                <img height="17" border="0" width="17" src="http://static.allegrostatic.pl/site_images/1/0/stars/star125.gif">
            </a>
            (Kupujący)
        </span>
    </td>
    <td class="list-color center_txt">
        <span class="pos">
        Pozytywny
        </span>
    </td>
    <td class="list-color center_txt">
    pon 25 kwi 2011 10:27:52 CEST
    </td>  <!-- here td is closed -->
    <td class="list-color center_txt">
        <div title="Aukcja jest już w archiwum i nie ma możliwości jej obejrzenia">
        1047801221
        </div>
    </td>
    <td class="list-color center_txt">
        <div style="cursor: pointer; text-decoration: underline;" onclick="show_feedback('10461546', '1047801221', '16112332', 'fb_all');" id="info_10461546_1047801221">
        Pokaż
        </div>
    </td>

 

There is a problem when  I try to get text from 1 cell: "pon 25 kwi 2011 10:27:52 CEST" Im doing it by:

...

n=doc.DocumentNode.SelectSingleNode("html/body/div/div[4]/div[4]/div/table[2]");

 c_data = n.SelectSingleNode("tr[2]/td/td").InnerText;

...

but I get text from this cell and some next rows as td isnt closed:

 

<td class="list-color center_txt">pon 25 kwi 2011 10:27:52 CEST
        </span>   

 

Does any one knows the way to get html transformed by java like firebug shows it??