get string from html code

Topics: Developer Forum, Project Management Forum, User Forum
Jan 17, 2013 at 1:19 AM
Edited Jan 17, 2013 at 1:23 AM

hi,

can someone give me hrlp about how can i use html agilitypack to get string from a html code for example i have this html code:

 

"<font color=\"blue\">123456789<br><font color=\"black\">This book tells the story of her life from A to Z. <br>"

 

the string i want my function to return must be:
123456789
This book tells the story of her life from A to Z.

(format isn't constant so using regex doesn't help. i just want to do something like what htmledit.squarefree.com does. if it is hard to implent styles. i can skip styles and just want to get strings)

 

Jan 25, 2013 at 6:48 AM
Edited Jan 25, 2013 at 6:48 AM

Can you put up some more samples of the string a problem I see here is that its not correctly formatted html no end tags for the fonts.

 

using Regex try this

        Regex r = new Regex("((?:<font ([^>])*>)(?<number>[0-9]*))[<].*(?:<font ([^>])*>)(?<title>([^<]|$)*)");
        Match m = r.Match("<font color=\"blue\">123456789<br><font color=\"black\">This book tells the story of her life from A to Z. <br>");
        string title = m.Groups["title"].Value;
        string number = m.Groups["number"].Value;

 

this will work in any way with different attributes etc on the font tags and with or without br at the ends.

 

Hope this helps even if no help with htmlagilitypack

 

Lee