EncodingFoundException

Topics: Developer Forum
May 30, 2010 at 9:00 AM

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=911
'NCrawler.Demo.vshost.exe' (Managed (v4.0.21006)): Loaded 'E:\ncrawler\Net 4.0\NCrawler.Demo\bin\Release\HtmlAgilityPack.dll', Symbols loaded.
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=2
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=3
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=909
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=4
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=908
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=5
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=907
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=6
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=906
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=7
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=8
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=905
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=9
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=904
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=10
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=903
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=11
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll

May 30, 2010 at 9:02 AM

NCrawler.Demo.vshost.exe Information: 0 : Downloading http://bbs.ifeng.com/forumdisplay.php?fid=349&page=489
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll
A first chance exception of type 'HtmlAgilityPack.EncodingFoundException' occurred in HtmlAgilityPack.dll
NCrawler.Demo.vshost.exe Information: 0 : Crawl ended @ http://bbs.ifeng.com/forumdisplay.php?fid=349&page=911 in 00:27:14.3754717
The thread 'vshost.RunParkingWindow' (0x1260) has exited with code 0 (0x0).
The thread '<No Name>' (0x1018) has exited with code 0 (0x0).
The program '[5888] NCrawler.Demo.vshost.exe: Managed (v4.0.21006)' has exited with code 0 (0x0).

May 30, 2010 at 5:06 PM

First Chance Exceptions are nothing to worry about. The debugger raises these as an FYI. They are exceptions that are thrown but still caught before ever getting to your code. You'll find tons of these when doing work with System.IO. Like if you do a File.AppendAllText() to a file that doesn't exist you'll see a First Chance Exception thrown because the file was not found, so the .NET code creates the file and writes to it. Everything still works properly.

May 31, 2010 at 8:23 AM
            Stream s = resp.GetResponseStream();
            if (s != null)
            {

                if (UsingCache)
                {
                    // NOTE: LastModified does not contain milliseconds, so we remove them to the file
                    SaveStream(s, cachePath, RemoveMilliseconds(resp.LastModified), _streamBufferSize);

                    // save headers
                    SaveCacheHeaders(req.RequestUri, resp);

                    if (path != null)
                    {
                        // copy and touch the file
                        IOLibrary.CopyAlways(cachePath, path);
                        File.SetLastWriteTime(path, File.GetLastWriteTime(cachePath));
                    }
                }
                else
                {
                    // try to work in-memory
                    if ((doc != null) && (html))
                    {
                        if (respenc != null)
                        {
                            doc.Load(s, respenc);
                        }
                        else
                        {
                            using (MemoryStream reader = new MemoryStream())
                            {
                                const int bufferSize = 1024;
                                byte[] buffer = new byte[bufferSize];
                                int bytesRead, totalBytesRead = 0;
                                while ((bytesRead = s.Read(buffer, 0, bufferSize)) > 0)
                                {
                                    totalBytesRead += bytesRead;
                                    reader.Write(buffer, 0, bytesRead);
                                }
                                reader.Seek(0, SeekOrigin.Begin);
                                doc.Load(reader, true);

                                // <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
                                string pattern = "<meta[^>]*charset=\"?([^\"]*)[^>]*>"; // "<meta[^>]*?charset=(\\w+)[\\W]*?>";
                                System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pattern);
                                System.Text.RegularExpressions.Match m = r.Match(doc.DocumentNode.OuterHtml);
                                // 
                                if (m.Success)
                                {
                                    string charSet = m.Groups[1].Value;
                                    Encoding documentEncoding = Encoding.GetEncoding(charSet);
                                    if (documentEncoding != null)
                                    {
                                        reader.Seek(0, SeekOrigin.Begin);
                                        doc.Load(reader, documentEncoding, true);
                                    }
                                }
                            }
                        }
                    }
                }

                resp.Close();
            }
I change the code of HtmlWeb class, I chang the code of function of Get.