This project has moved. For the latest updates, please go here.

Extract data from an HTML form using HTML Agility pack

Topics: Developer Forum, User Forum
Nov 9, 2011 at 2:44 PM

m trying to list all nodes in the HTML form  dynamically using HTML agility pack, meaning that I don't know the names of the Attributes and the input names. The problem is when I want to get the label corresponding to each input.

<form name="input" action="html_form_action.asp" method="get">
Username: <input type="text" name="user" />
<input type="submit" value="Submit" />

Username  Input"user"

So here I want to write Username then the input, it seems really obvious in this example but sometimes they're not direct siblings, there would be many hidden inputs, or other tags.

Another example:

   <input type=hidden name="startDate">

<TR>  <TD bgColor=#008088 colSpan=2 class="headfont">

<FONT color=#FFFFFF>  <B>* Enter ur username and password</B> </FONT>



<TD bgColor=#9ccdcd class="datafont"><FONT color=black>Username</FONT></TD>

<TD bgColor=#9ccdcd class="datafont">

<INPUT tabIndex=1 name=stuNum

="off" size="20"></TD></TR>


Am using C# winforms in my project .

I have few ideas but they will take lots of time,so I thought since am new to HTML agility pack there would be a way or some shortcut to get it,,,Any suggestions?

Nov 11, 2011 at 5:18 AM

I'm trying to extract content as well from an HTML file and I'm having difficulty doing so as well... Were you successful doing what you wanted?

Nov 11, 2011 at 5:20 AM

What content exactly do u want to extract? am still writing the code to get the label corresponding to the input field! :(

Nov 12, 2011 at 5:49 AM

Hey dkilani,

I want to extract all the links to all files available in the page.... Like jpgs and others and store all of those links in a collection.


Nov 12, 2011 at 9:48 AM

1) use HTML Agility pack to transform input HTML into XHTML

2) use XHTML with a XDocument or XElement and LINQ to XML to query for A href

Agility pack linq is not as deep as LINQ to XML,, so avoid it.

Nov 12, 2011 at 9:51 AM

XElement x = ...;

var hrefs =from element in x.Descendants("a") let href=(string)element.Attribute("href") where !String.IsNullOrWhitespace(href) select href;

You can select the element if you prefer.

Nov 12, 2011 at 9:58 AM

Interesting!! what do you mean by not as deep?
I have few questions regarding  c# and HTML agility pack and the c# web browser! if you don't mind!
I logged in to a page , but now When I  want to take the link from an iframe to load its data ,am having what seems to be some sessions crisis! it won't direct me since am just basically copying and pasting the link as a normal URL! Any suggestions ? or a function I can use?

Nov 14, 2011 at 10:16 PM
Edited Nov 14, 2011 at 10:16 PM

Does the solution provided by softlion apply to my question too? If so, do you guys mind explaining to me how to extract the links from the provided methods given that I passed the webpage source string to it?