Html Agility Pack Beta 2 is a minor update to Beta 1 with support documentation and a few more bug fixes. The two major additions are newly compiled help documentation and the Html Agility Pack Explorer. HAP Explorer is meant to help visualize the node tree of the HtmlDocument object. It supports opening a static file or a url. Release Notes
- Added SandCastle/Docproject Documentation project. This will be used to generate Chm and HxS documentation files
- Added new Html Agility Pack Explorer project. This is a wpf application that can be used to explore the HtmlDocument node tree.
- Major cleanup on the code base. Ran an Aggressive Resharper code cleanup across the library. Updated XML comments and other minor tweaks for smaller and concise code
- Included patch for enabling Proxies when getting a url for parsing
- Fixed XPath property to not include the #document node
The Documentation project requires Sandcastle, DocProject and the Visual Studio 2008 SDK installed. For this reason it is only included in a separate solution.
|
I evaluated this against a collection of 30 thousand html pages in the wild. It did very well at converting them all to reasonable xml to store in an SQL SERVER 2005 database xml column type. The minor bugs I found were easy to identify and fix in the source code.
The parsing speed was over 100 times better than another project on CodePlex, the System.Html software.
by
publius
on
Mar 31 2010 at 2:30 PM
Very easy to use and useful api.
by
hasankhan
on
Mar 26 2010 at 5:53 AM
After some initial efforts to catch the idea I have found that this library is simple enough and very good time saver. Thank you very much for excellent product. Keep your efforts, please.
by
agirenko
on
Mar 12 2010 at 7:57 PM
Excellent product. Saved my day. Easy to use and really fast to learn. Had to parse some really bad HTML (it even had two body tags) and it worked flawlessly.
by
carlescs
on
Mar 11 2010 at 3:31 PM
Fantastic! I have a complex web scraping app that used the WebBrowser object (MSHTML DOM engine). Other than some threading issues (it's based on COM remember) it worked OK - until I had to deploy it to a remote ASP.NET server (security privileges). The HtmlAgilityPack saved the day. The object model is very similar, and all I had to do was replace API calls and my parsing logic remained in tact!
by
andychops
on
Mar 9 2010 at 6:52 PM
Great Library, saves me so much time
by
rsoeteman
on
Mar 5 2010 at 2:57 PM
Briljant package. Takes some time to get familiar with, but ideal for HTML parsing. Currently working on a parser for imdb.com pages because imdb-api is way too complex and buggy.
by
loekf
on
Mar 2 2010 at 2:43 PM
In Response to some of the people out there who have said HTML AGILITY is useless : This might not be the very best HTML editor /parser out there but is definitely one of those , but saved me tons of time .You guys deserve appreciation for your work. Keep up the good work Guys :) Kudos !!!!!!!!
by
acharyapank
on
Feb 24 2010 at 3:44 PM
Great piece of work. A time-saver. Look forward to its continued development
by
cwford
on
Feb 10 2010 at 7:39 PM
Brilliant! I needed to parse a hierarchy of hundreds of linked web pages, absolute piece of cake with this library. Got my hacky little project done in under an hour, and can move on with my life. Thank you, thank you, thank you.
by
specialBobby
on
Feb 3 2010 at 11:21 PM
Great project, makes html parsing a breeze. Thanks!!
by
mausch
on
Jan 27 2010 at 8:32 PM
Love the direction this library is going. It still has a few rough edges and some land mines (stack overflows), but definitely on course.
by
SMHoff
on
Jan 22 2010 at 2:49 PM
Awesome set of classes to go after online content. Really does turn any bit of HTML into a XPath enabled, surfable, DOM.
Awesome work guys.
by
inestyne
on
Dec 12 2009 at 9:56 PM
Due to how close this API is to System.Xml, I felt immediately comfortable with it. It did the job I wanted and saved me lots of time on a personal project.
I didn't immediately see anything to help manipulate the style attribute, but I certainly won't hold that against this release. Nice Job!
by
cooperpx
on
Dec 12 2009 at 7:26 PM
Excellent, thanks for the great work!
by
reteep
on
Nov 3 2009 at 12:52 PM