exclude child node but include current node

Topics: User Forum
Nov 27, 2011 at 4:55 AM

hi,

i'm scraping a site and am trying to grab only the text inside the h4 tag (Wed., November 30, 9:00pm), but not the stuff in the nested a tag (VenueName).  I'm searching for specific PeopleNames and when there's a match I want to grab the date separately from the venue name.

<tr>
 <td colspan="2" class="upper">
  <h3>
   <a href=http://test.com/site/">PeopleName </a>
  </h3>
  <h4>
<a href=http://test.com/site/">VenueName </a>:
  Wed., November 30, 9:00pm
  </h4>
 </td>
</tr>
<tr>

i'm not even sure i'm doing this right.  the plan was to take the xpath property of the a tag whose innerText matches PeopleName, and append it with the relative path to the h4 tag.

ive tried:  //../../h4[not(self::a)] but it returns the venue name like below.  am i traversing properly?  once i'm sure i'm doing that right and can get to the h4 tag, do i have to regex my way to the date?

VenueName    :
     Wed., November 30, 9:00pm