This project has moved. For the latest updates, please go here.

Extracting Images And HTML From .html File

Topics: Developer Forum, Project Management Forum, User Forum
Apr 16, 2012 at 5:35 PM


I'm new to the Html Agility Pack and was wondering if someone could help me out.  I have a WPF C# project with an HTML string as shown below:

htmlString = "<HTML><HEAD></HEAD><BODY>Here are some images.</br>1) <IMG style='MARGIN-BOTTOM: 20px; MARGIN-LEFT: 20px' align=right src='images/sample001.jpg'>2) <IMG style='MARGIN-BOTTOM: 25px; MARGIN-LEFT: 25px' align=right src='images/sample002.png'></br> And some docs as well.</br>1) href='javascript:parent.POPUP({url:'testDoc001.htm',type:'shared',width:600,height:645})'></br>2) href='javascript:parent.POPUP({url:'testDoc002.html',type:'shared',width:700,height:712})'></br></BODY></HTML>";

I would like to be able to parse this string and get out an array of all of the images and .html documents that are references.

In this particular example this array would be:

[0] = "images/sample001.jpg" [1] = "images/sample002.png" [2] = "testDoc001.htm" [3] = "testDoc002.html"

Can someone send me a snippet of code or show me how to go about doing this?