ISO-8895-2

Topics: User Forum
Oct 19, 2006 at 10:52 PM
Hi
I've tried to load page which has ISO-8895-2 encoding with HtmlWeb, but it skipped special characters. I've tried to change AutoDetectEnoding to false, but nothing happened.

In method:
private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc)
of HtmlWeb, there is a part of code:

if ((resp.ContentEncoding != null) && (resp.ContentEncoding.Length>0))
{
respenc = System.Text.Encoding.GetEncoding(resp.ContentEncoding);
}

but resp.ContentEncoding is always set to string.Empty

I've added:
else if ((respenc == null && resp.CharacterSet != null && resp.CharacterSet.Length>0))
{
respenc = System.Text.Encoding.GetEncoding(resp.CharacterSet);
}

and now it works fine.
Is there another way of making HtmlWeb working with such web pages properly?
Examples: www.onet.pl www.wp.pl (special characters: n,s,z,z,ó,e)
Coordinator
Nov 9, 2006 at 11:25 AM
Hmmm... If I remember correctly, there may be some bugs in the HtmlWeb class with encoding.
HtmlWeb was just some kind of a helper class. It turns out many people use it :-)
Dec 5, 2007 at 8:24 PM
Edited Dec 5, 2007 at 9:51 PM
Use System.Net.WebClient and HTMLDocument instead of HTMLWeb, so I got the right results.
See my discussion on :

http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=18735