This project has moved. For the latest updates, please go here.


Throw ArgumentException If Charset Is Invalid


For a html page like below. charset is empty or invalid string
<head> <meta http-equiv="Content-Type" content="text/html; charset="> </head> ...
Parsing of the document would throw a ArgumentException. This is not user-friendly. It is preferable to have HtmlAgility ignore invalid charset value.
Root cause is in file HtmlDocument.cs, function ReadDocumentEncoding, Encoding.GetEncoding(charset) throws a ArgumentException if the argument is not invalid charset name.


alekz wrote Sep 15, 2009 at 3:38 PM

Successfully parses encoding in:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

but fails to parse
<meta http-equiv="Content-Type" content="text/html; charset= ISO-8859-1">
(with the whitespace preceding the encoding name)

DarthObiwan wrote Jan 1, 2010 at 3:37 AM

this appears to be partially fixed in the current build. I was however able to find if the charset is set to a string that is not a valid encoding it will still throw the exception. I have added in a default of utf-8 and a property to allow it to be set if needed.
Usage would be
HtmlDocument.DefaultEncodingCharSet = "ISO-8859-1"
var doc = new HtmlDocument();

