extact text only from html content using Html Agility Pack

Topics: Developer Forum, Project Management Forum, User Forum
Oct 16, 2010 at 5:22 PM
Edited Oct 17, 2010 at 5:26 PM

 

Hi there

I have been trying to extract text from the html content ,But, had resulted with some html tags within the extracted text  , what i should i do in order  to extract only plain text wothout having too much tags or stricpts , i am testing  on differents webpages,so html elements are unknow or chagnes from time to time .i am using Html Agility Pack also . here some of code

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

  doc.Load(new StringReader(result));

   HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//body");

   string results  = bodyNode.InnerText.ToString();

regex code
         public string removehtmltags_Regex(string source)
        {
            return Regex.Replace(source, "<.*?>", string.Empty);
        }
  
   result from this code ,

 

2:59:46 PM: var _GlobalNavHeaderUtf8Encoding=true;var includeHost="http://include.ebaystatic.com/";Skip to main contentBuyMy eBaySellCommunityContact usHelpBasketBasketvjo.darwin.globalnav.shoppingcart.ShoppingCart.Refresh("fullCart", "emptyCart")Enter your search keywordvjo.darwin.core.greetings.VjGreetingsClient.writePersonalHeader("Sign in", "https://signin.ebay.co.uk/ws/eBayISAPI.dll?SignIn", "register", "https://scgi.ebay.co.uk/ws/eBayISAPI.dll?RegisterEnterInfo", "Sign out", "https://signin.ebay.co.uk/ws/eBayISAPI.dll?SignIn", "Welcome! ##1## or ##2##", "Hello, ##3## (##1##). ##2##", "Hello, ##3## (##1##). (Not you?)", "Hello! Sign in/out.", "", "##1##", " | You have ##1## alert.", " | You have ##1## alerts.", "1", "", true) Must use Buy It Now and PayPal. See conditionsvjo.darwin.core.ebayheader.rebate.RebateBox.Refresh("rbt", "10", " You have vouchers available")CATEGORIES&nbsp;FASHIONMOTORSDAILY DEALSvar svrGMT = 1287323067907;ebay.oDocument._getControl("headerCommon")._exec("writeStyleSheet");vjo.Registry.put('bta', new vjo.darwin.globalnav.bta.BuyerTransactionAlert("bta", 60, 2, 2, "http://bmsgs.ebay.co.uk/ws/eBayISAPI.dll?GetBuyerTransactionAlerts", "http://q.ebaystatic.com/aw/pics/uk/", "http://cgi.ebay.co.uk/ws/eBayISAPI.dll?ViewItem", "Watched Item ending soon!", "You've been outbid!", "You've received a Second Chance Offer", "You've received a Transaction Confirmation Request."));

vjo.darwin.globalnav.util.EventReg.aggregate(vjo.Registry._bta.onRefreshHdl());

vjo.darwin.globalnav.util.EventReg.browseCategories("BrowseCategoriesMenu", "http://include.ebaystatic.com/categoryjs/87/en_GB/category_87en_GB0.js");

vjo.darwin.globalnav.util.EventReg.impression("");

var _oGlobalNavRTMInfo={};_oGlobalNavRTMInfo.aRTMPlacementData=[];_oGlobalNavRTMInfo.aRTMPlacementData=[{"ord":null,"maxWidth":"470","rtmUrl":"http://srx.uk.ebayrtm.com/rtm","htmlId":"rtm_html_433","userId":null,"isUserSignin":false,"GUid":null,"renderBeforeOnload":true,"maxHeight":"22","pid":"433"},{"ord":null,"maxWidth":"160","rtmUrl":"http://srx.uk.ebayrtm.com/rtm","htmlId":"rtm_html_876","userId":null,"isUserSignin":false,"GUid":null,"renderBeforeOnload":true,"maxHeight":"22","pid":"876"},{"ord":null,"maxWidth":"160","rtmUrl":"http://srx.uk.ebayrtm.com/rtm","htmlId":"rtm_html_912","userId":null,"isUserSignin":false,"GUid":null,"renderBeforeOnload":true,"maxHeight":"22","pid":"912"}];

var ieBackCompat=false;

if(document.compatMode&&document.compatMode!="BackCompat") ieBackCompat=true;

var css='';

document.write(css);

 

var ieBackCompat=false;

if(document.compatMode&&document.compatMode!="BackCompat") ieBackCompat=true;

var css='';

document.write(css);

 

var ieBackCompat=false;

if(document.compatMode&&document.compatMode!="BackCompat") ieBackCompat=true;

var css='';

document.write(css);

&nbsp;

 

 

Home &gt; Help &gt;&nbsp;Membership &amp; account&nbsp;&gt;&nbsp;Protecting your account&nbsp;&gt;&nbsp;Privacy PolicyHelp Thanks for your Comment!Community input helps improve eBay's Help pages.  We're sorry, but we are not able to post your comment at this time. Please try again later.

 

 


 

thanks in advance