How to use htmlagilitypack in Windows Phone 7

Topics: Developer Forum
Aug 27, 2010 at 8:06 PM

Is there a way to make the htmlagilitypack to work for Windows Phone 7 development using VS2010 WP7 edition? I was unable to create a reference in the IDE due to the lack of WP7 runtime support. Are there any better ideas, or other alternatives available to WP7?

 

Aug 27, 2010 at 8:53 PM

Have you tried using the HAPLight project that is in the SVN repository right now. It is a Silverlight version of Html Agility Pack and should work on WP7 with little modification (it targets Silverlight 4 so you may need to remove a few things)

Sep 2, 2010 at 4:53 PM

Windows Phone 7 doesn't Support System.Xml.Xpath, so HtmlAgilityPack wont work on WP7

 

Sep 2, 2010 at 4:55 PM

Yes it will if you modify the source and comment out the xpath files. This has already been done for .NET CF . The LINQ interfaces can be used instead

Sep 3, 2010 at 1:04 PM

Not to sound ridiculous, but is there download available for the binaries of the .NET CF version?  I'm like the originator of this thread and would like to use this utility on the Win 7 phone.

Sep 3, 2010 at 1:17 PM

the .NET CF binaries wont work on Win Phone 7 even if they were available. Right now they are only in the SVN repo. The .NET CF and Silverlight versions were made after the last release. I just did a quick lookover of the solution. For the HAPLight project if you remove the HtmlNodeNavigator.cs file from the project, remove the System.Xml.Xpath reference and change the project over to a SL 3 library it should work. I don't have the Win Phone 7 tools installed right now on this computer so I can't be 100% sure.

Sep 5, 2010 at 12:30 AM

I made the code correction for work with WP7

I started with the .NET CF from SVN Solution and with miminums changes, it works on WP7

And yes, there is no XPath on WP7, but Linq to Object works great :)

Thanks for all

Sep 17, 2010 at 9:34 PM

Yes -- thanks I finally got around to following your instructions and I had no problems compiling but now comes the next question?

I can load a document:

 HtmlAgilityPack.HtmlDocument htmDoc = new HtmlAgilityPack.HtmlDocument();
 htmDoc.LoadHtml(responseData);

What do I substitute for searching the document?  I was using calls like :

 HtmlAgilityPack.HtmlNodeCollection NoMatchingName = htmDoc.DocumentNode.SelectNodes("//span[@id='ctl00_ctl68_lblPersonNoResults']");

I understand that the Xpath that I was using is not supported but could you provide an example of how I would use Linq in place of these calls?

Thanks,

Sid

 

 

Sep 17, 2010 at 10:43 PM

basically

 

var nodeList = htmDoc.DocumentNode.Descendants("span").Where(x=>x.Id == "ctl00_ctl68_lblPersonNoResults");

//or just for one node
var node = htmDoc.DocumentNode.Descendants("span").FirstOrDefault(x=>x.Id == "ctl00_ctl68_lblPersonNoResults");

//node will be null if it's not found

Sep 21, 2010 at 6:44 PM

Thanks for that and could you point me in the direction on how to do this with LINQ:

HtmlAgilityPack.HtmlNodeCollection personPages = htmDoc.DocumentNode.SelectNodes("//table[@id='ctl00_ctl68_dgPersonSearchResults']/tr[last()]/td/a");

This was providing me with all the <a> tags contained within the named table on the last row.

Is there anything you know of which has sample usage of HAP and LINQ?

thanks, Sid

 

 

Sep 26, 2010 at 4:06 AM

I've checked into SVN a HAPPhone project and solution for what will be the official Html Agility Pack for WP7.

as for your question on the LINQ statement.

var links = htmDoc.DocumentNode.Descendants("table").First(n => n.Id == "ctl00_ct168_dgPersonSearchResults").Elements("tr").Last().Descendants("a");

One thing to watch out for is First and Last, they will throw an exception if nothing can be returned. FirstOrDefault would be good as an middle assignment to handle the possibility there are no table nodes found

Oct 18, 2010 at 4:50 PM

Thanks for porting HAP to Windows Phone. I have a question regarding parsing the InnerText.

How do I get the correct special characters? I changed the HAPPhoneTest Project like this to demonstrate my problem:

private void FetchSite(object sender, RoutedEventArgs e)
        {
            HtmlWeb.LoadAsync("http://oeffentlicher-dienst.info", (s, args) =>
            {
                Results.Text = String.Join(Environment.NewLine,
                                           args.Document.DocumentNode.Descendants("a").
                                            Select(
                                                x =>
                                                x.InnerText).ToArray());
            });
        }
You should see 3 characters replaced by an <?>. The problem is not specific to this site, but seems to affect every site that uses specials characters.  I tried quite a few ways, but can't seem to find the right answer for this.

Thank You in advance.

Oct 27, 2010 at 12:24 AM

Dear darthobiwan !

Please help me convert it to Linq :

XPath("//div[@class='iconadd']/a")

XPath("//div[@class='leftInfo']/h1/a")

 

Thanks !

 

Oct 29, 2010 at 6:36 PM
Edited Oct 29, 2010 at 6:37 PM
tom_codon wrote:

Dear darthobiwan !

Please help me convert it to Linq :

XPath("//div[@class='iconadd']/a")

XPath("//div[@class='leftInfo']/h1/a")

 

Thanks !

 

 

Any help ?

Oct 29, 2010 at 6:41 PM
Edited Oct 29, 2010 at 6:42 PM

I'm not going to be much help for a while. The president of my company and 3 of his sons disappeared Monday. His plane fell of radar in Wyoming. It's been a very stressful week and I'm not much help to anyone, I've been at Sierra Bravo since we were a small 10 man shop.. we just reached 160, 6 years later.

http://lukeandginger.com/blog/2

Here's a great public radio article about what's going on http://minnesota.publicradio.org/collections/special/columns/news_cut/archive/2010/10/an_emotional_investment_in_a_n.shtml

 

 

Oct 30, 2010 at 9:38 AM
darthobiwan wrote:

I'm not going to be much help for a while. The president of my company and 3 of his sons disappeared Monday. His plane fell of radar in Wyoming. It's been a very stressful week and I'm not much help to anyone, I've been at Sierra Bravo since we were a small 10 man shop.. we just reached 160, 6 years later.

http://lukeandginger.com/blog/2

Here's a great public radio article about what's going on http://minnesota.publicradio.org/collections/special/columns/news_cut/archive/2010/10/an_emotional_investment_in_a_n.shtml

 

 

I'm very sorry about that , hope everything will good for you darthobiwan !

Best Regards

Tom

Mar 9, 2012 at 2:50 AM

If you cannot find System.Xml.XPath reference in HAPPhone.7.1.csproj when compiling Trunk\HAPPhone.7.1.sln, please add reference from %ProgramFiles%\Microsoft SDKs\Silverlight\v4.0\Libraries\Client\System.Xml.XPath.dll.

Mar 9, 2012 at 3:42 AM

Hey!  U  also can use Xpath to get the data you want .

U could get the data like that

//  ----------  if  you want get single node

          var node2 = doc.DocumentNode.Descendants("span")
                 .FirstOrDefault(x => x.XPath == "/html[1]/body[1]/div[2]/div[2]/div[1]/div[1]/div[2]/p[1]/span[1]");

//  then  you can do something   like that

       Message.Show(node2.InnerText + node2.GetAttributeValue("class","") );

  e-mail   hzdgjb@126.com

 

 

    

Mar 9, 2012 at 6:22 AM

actually  HtmlAglityPack support the xpath  .

you can  get the node  by this way.



var node1 =doc.DocumentNode.Descendants("a").FirstOrDefault(x=x.Xpath=="/html[1]/body[1]/div[2]/div[2]");

 

 

 

Mar 27, 2012 at 7:33 AM
Edited Mar 27, 2012 at 7:34 AM

Nashville Web Design we’re friendly and make the process of building your Nashville website and designing your SEO and Internet Marketing campaign a pleasant experience. We specialize in providing clients with a cohesive online branding and growth strategy. From designing their Nashville website to launching their Nashville SEO campaign, we ensure our client’s reach everyone of their Internet Marketing goals.

May 25, 2013 at 4:27 PM
As I found on another thread, just add a reference to the Silverlight assembly it's complaining about.
It's usually located here:
C:\Program Files (x86)\Microsoft SDKs\Silverlight\v4.0\Libraries\Client
Apr 14, 2014 at 8:42 AM
Like weiser said, just adding the System.Xml.XPath reference typically works!

As he said, it's typically located here: __C:\Program Files (x86)\Microsoft SDKs\Silverlight\v4.0\Libraries\Client __

However, it can cause problems. The only problem I ever found was the inability to navigate to a page inside a class library, using the /Namespace;component/Page.xaml format. This problem exists if...
  • You have the System.Xml.XPath in the given class library
  • You try to navigate to a page inside that class library.
Nevertheless, the fix is easy. Simply create a separate class library containing the page files, without any reference to the System.Xml.XPath. If your page requires XPath, then I suppose you'll be screwed. But in my case, my page was simply just a fullscreen photo viewer that was unrelated to my HTML parser/viewer.