Encoding and html entities.

Topics: User Forum
Jan 10, 2010 at 7:15 PM

Hi all,

I am new to Html Agility Pack and I have got some basic questions for which I could not find any simple answers.

First of all I would like to load the content of a page in Unicode so I'm doing something like this:

HtmlAgilityPack.HtmlWeb htmlWeb = new HtmlAgilityPack.HtmlWeb();
htmlWeb.AutoDetectEncoding = true;
HtmlAgilityPack.HtmlDocument htmlDoc = htmlWeb.Load("http://www.google.co.jp/");
tbLog.Text = htmlDoc.GetElementbyId("ghead").InnerText;

The trouble is that what is displayed is not Unicode:

�E�F�u � ���� �n�} �j���[�X ���� Gmail ���̑� ▼�|�� �u���O YouTube �J�����_�[ �ʐ^ �h�L�������g ���[�_�[ �T�C�g �O���[�v �T�[�r�X�ꗗ » iGoogle | �����ݒ� | ���O�C��

What’s wrong with what am I doing?

Also, and this is my second question, how can I get the html entities (▼ and ») to be replaced by their actual characters.

Thanks in advance!

Ben