InnerText problems

Jun 30, 2009 at 1:26 AM

Not sure if this is a bug or feature.

I'm trying to parse for text within an HTML page with a form and when I use the InnerText property for let's say:

<td>
<select>
<option>option1</option>
</select>
text
</td>

if the node is select the InnerText property returns 'option1'.  or the td node will return option1text.  What I'm trying to parse is just for 'text' for instance.

 

Is there a way to make InnerText property not fetch the innertext of nodes?

Jun 30, 2009 at 2:06 AM

There should be #text node after the select node inside the td node's children

<HTML>
<HEAD>

</HEAD>
<BODY BGCOLOR="#FFFFFF" TOPMARGIN="0" LEFTMARGIN="0" MARGINWIDTH="0" MARGINHEIGHT="0" TEXT="#000000" onLoad="initPopup()" onUnload="exitPopup()">
<table><tr><td>
<select>
<option>option1</option>
</select>
text
</td>
    </tr>
    </table>
</BODY>
</HTML>

I found it at

doc.DocumentNode.ChildNodes[2].ChildNodes["body"].ChildNodes["table"].ChildNodes["tr"].ChildNodes["td"].ChildNodes["select"].NextSibling.InnerText

Any text found within a tag is contained in a #text node. This includes the newlines before/after tags. It does make it a bit hard