LINQ Support

Topics: Developer Forum, Project Management Forum, User Forum
Jun 8, 2009 at 2:41 PM

This post is more of a heads up. I'm working on a project which requires searching through the entire heirarchy of a page and changing it. Currently the agility pack doesn't support LINQ all that well. While the navigator is all well and good for doing xpath queries, it is are still too limited for what I must do (like find tags missing a certain attribute). So I'm working on updating the agility pack to use Generic Collections and am experimenting with having it actually inherit from LINQ to XML (and still keep the current functionality).

Hopefully when I'm done it will be something useful by the project. Even if it's not, some of the changes I've worked on already will help quite a bit. The biggest being changing all the collections to inherit from IList<T> . The next part I'm working on if I can't get the LINQ to XML part working correctly I'll be fully implementing things like Descendants() and DescendantNodes(), Ancestors() which will help quite a bit with doing LINQ on a document.

I already have the below working

var aspLinks = body.Descendants().Where(x => x.Name.Equals("asp:hyperlink"));

This will get all the ASP.NET hyperlinks in the body of the document. I haven't tried yet but I think this also can be used with XAML

Jun 22, 2009 at 5:00 PM

I have only just found out about HtmlAgilityPack a couple of days ago, how did I miss it?

But you are right it is missing LINQ support, and in this world of invalid html LINQ support would be handy to have.

I have yet to contribute much to open source, but Codeplex says it is open source, so can't you start your own branch?

I would be most interested in using LINQ in order to use web pages that people didn't spend the time to write out properly in the first place.

 

cheers

 

 

 

Jun 22, 2009 at 5:18 PM

I'm hoping to contribute to this project but I need to be given permission by the maintainer to do that. Seems like he only checks things in here every few months. Right now my only course is to fork and start a new project and I do not want to do that. If I get permission I'll probably start up a new branch for these changes.

I do have LINQ working rather well and have made other changes for formatting. When parsing an aspx page and then saving it, all the capitilization was gone so I added a new output mode and modified the parsing routine to preserve original capitilization. It took a bit of work since the parser made everything lowercase in many places.

I'm right now working on the parser itself so it can save formatting information of the original document. I'm one of those people that when I put a newline between two attributes, I want it to stay that way.

The project I'm working on may end up being open source as well. Part of it is a library that will provide a better query interface over HTML/XAML/XML. This way you can do var results= document.GetTag("asp:hyperlink").MissingAttribute("class"); And after that do a replace/modify routine document.ReplaceWithResults(results).ReplaceInsertAttribute("class","nav_link") . (note I haven't settled on a syntax yet, this was just to demonstrate a possibility)