This project has moved and is read-only. For the latest updates, please go here.

Extract text

Topics: Developer Forum, User Forum
Feb 15, 2007 at 10:53 AM
is it possible to extract the plain text of the html page, i.e. strip the html tags and return only the 'text' ?

Feb 15, 2007 at 11:00 AM
to answer myself.

HtmlDocument doc = new HtmlDocument("document.html");
string text = doc.DocumentNode.InnerText;

this seems to return the title and text but not any meta tags.