Jul 21, 2011 at 10:05 PM
Edited Jul 21, 2011 at 10:08 PM
using HTML agility pack it should be pretty easy using XPath to select the container element that holds the thread content.
So you might want to have registered in the database your different forums/websites, for each you might want to have registered the XPath rules that extract the forum content has you want and then use HTML Agility pack to SelectNodes according to all the
here are some examples to obtain elements from html with xpath using hap
//table[contains(@id, '_tblProperty')] select all TABLE elements which contains '_tblProperty' in it's ID
//div[@class='col3']/p select the second P, which parent it's a DIV with class='col3'
//span[contains(@id, '_lblPrice')]/a select the first A element child of a span that contains in ID '_lblPrice'
the rest of the work it's by HTML Agility pack
Hope it helps.
'dherbe' thanks for your excellent reply. I really appreciate it.................
The problem is the way my application is designed. Basically it is a GUI application whereby user browses to 'ANY' web forum. I have
one textbox whereby it uploads the HTML contents and the other textbox which is using regex, cleans the HTML contents of textbox1 and inputs it into
textbox2 ready for tranfer to the database along with the web URL.
Regex only works upto a certain limit whereby it struggles to cope if the right regex arent mentioned in the code.
Hence, why I have turned towards htmlagilitypack :D
After reading the above, is there anything HAP can help me with where it can be again 'ANY' forum?