stack overflow in loadhtml

Topics: Developer Forum
Jul 18, 2012 at 12:35 PM
Edited Jul 18, 2012 at 12:37 PM

 

Hi. I hope someone can help me catch/work around this issue. because this has caused me days of lost work trying to debug this.

I have a number of sites which I am trying to parse.  take www.st-barnabas.org.uk as an example. I try to load the responsestream into html agility pack, but i get a stack overflow (which i cant catch) and this makes everything fail

 

Can anyone offer any advice?

 protected string DocFromStream(string url, Stream stream, out HtmlDocument doc)
        {
            var htmlString = string.Empty;
            try
            {
                if (stream != null)
                    using (var sr = new StreamReader(stream))
                    {                       
                        htmlString = sr.ReadToEnd();
                        sr.Close();
                    }
            }
            catch (Exception ex)
            {
                log.InfoFormat("could not load htmlstream from {0}: {1}", url, ex);
            }
            doc = new HtmlDocument();
            doc.LoadHtml(htmlString); ///<--- breaks here
            return htmlString;
        }

 

 

 

Aug 10, 2012 at 2:58 PM
Edited Aug 10, 2012 at 3:00 PM

I have the same issue on a multi-threading windows service -> after days of debugging with help of windbg found the source of the problem to exact same endpoint htmlDocument.LoadHtml(stringInput);

Does anyone has a fix for this?

I think it is the same issue as this one http://htmlagilitypack.codeplex.com/workitem/30268 , but the fix from there does not seem very reliable :) the windows services are consuming a lot of memory after applying that fix.

It would be really great if someone who knows what is doing could fix this.

Sep 18, 2012 at 5:22 PM

i never get any error with this, maybe its will help to you.

Goodluck

 

 public static string GetURLData(string URL)
        {
            try
            {
                HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(URL);
                request.UserAgent = "Omurcek";
                request.Timeout = 4000;
                WebResponse response = request.GetResponse();
                Stream stream = response.GetResponseStream();
                StreamReader reader = new StreamReader(stream);
                return reader.ReadToEnd();
            }
            catch (Exception ex )
            {
                LogYaz("Receive DATA Error : " + URL   + ex.ToString());
                return "";
            }

        }

 string Data = GetDataURL("www.abc.com")
 HtmlAgilityPack.HtmlDocument doc
 doc.LoadHtml(Data);