This project has moved and is read-only. For the latest updates, please go here.

parsing html

Jul 19, 2011 at 9:49 AM


I'm looking for a way for extracting the content of an html page.
Now I'm using the WebBrowser control but sometimes I get some error and it is also too slow.

So I'm looking for an open souce library that help extracting the text without all the html tags.
Can HtmlAgilityPack do this ?
Can you explain me how can I start to do something like this with the library ?

The problem is that often the html content is not well formed and this give me a lots of problems.

Thanks a lot.