This project has moved and is read-only. For the latest updates, please go here.

exclude child node but include current node

Topics: User Forum
Nov 27, 2011 at 5:55 AM


i'm scraping a site and am trying to grab only the text inside the h4 tag (Wed., November 30, 9:00pm), but not the stuff in the nested a tag (VenueName).  I'm searching for specific PeopleNames and when there's a match I want to grab the date separately from the venue name.

 <td colspan="2" class="upper">
   <a href=">PeopleName </a>
<a href=">VenueName </a>:
  Wed., November 30, 9:00pm

i'm not even sure i'm doing this right.  the plan was to take the xpath property of the a tag whose innerText matches PeopleName, and append it with the relative path to the h4 tag.

ive tried:  //../../h4[not(self::a)] but it returns the venue name like below.  am i traversing properly?  once i'm sure i'm doing that right and can get to the h4 tag, do i have to regex my way to the date?

VenueName    :
     Wed., November 30, 9:00pm