Link Extraction

Topics: User Forum
Jun 11, 2010 at 12:45 PM
Edited Jun 11, 2010 at 12:45 PM
Hello, I just started today using HAP and already can see how powerful it is. Thank you for making it :) Naturally I have a question :) I was playing with table extraction, and I have made some progress already. I need to extract some data from (Movies). I experimented a lot and found out that following piece of code will display table that contain actors. /code HtmlWeb web = new HtmlWeb(); HtmlDocument HD = web.Load(""); foreach (HtmlNode table in HD.DocumentNode.SelectNodes("//table[@class=\"cast\"]")) { MessageBox.Show(" ACTORS:" + table.InnerText); } /code Now I am facing a big roadblock. Let me try to explain. The link that i wrote above (robocop btw), which is Is the link that I received after performing a search. If I go to, I must do a search. That is easy. After I do search, I am given another page that has many links. Both relevant and irrelevant ones. There is a pattern there, If I search for Robocop, that second screen will give several versions of robocop movies. I am interested in the first link. It even has name Robocop in it. How do I automatically click that link so it can take me to the other page where I can play with agility pack some more. I was thinking to extract that link, but I failed miserably at that. Can some help me with that? Thank you very much. I am sorry if the text I wrote is hard to understand.
Jun 11, 2010 at 12:47 PM
OMG What happened with my spaces?!?! I don't usually write like that.
Jun 15, 2010 at 1:21 PM

You must pick exact xpath query. You can use different xpath expressions, using axis, functions, indexes.

For example, in your case you can use:

select 'td', which inner text is '1.', then select next 'td' and get inner 'a' node

It's not the best, but first match will hit Robocop 1987.

Did I understood your question right?


Jun 15, 2010 at 1:26 PM

Thank you very much for answering. Yes, I believe you did understand my question.


I will play with your code and let you know did it work :)



Thank you again