This project has moved and is read-only. For the latest updates, please go here.

Make HtmlNode.WriteTo virtual

Topics: Developer Forum, User Forum
Mar 7, 2013 at 9:54 AM
We have a CMS with rich editor. The editor is inserting special HTML elements that serve as placeholders for videos, audios, images or blocks of text. We are migrating from regular expression to HAP to detect these elements. Intended workflow is like this:
  1. Load the document via HAP
  2. Detect special placeholders in the document and replace them with custom HtmlNode-derived objects
  3. Split the document array of objects of two kinds - HTML and special
In order to do this I'm calling HtmlDocument.Save passing special TextWriter-derived class. I need special HtmlNode-derived class to interact with special TextWriter-derived class in a special way (it basically closes one underlying StringBuilder, creates special placeholder object and starts a new StringBuilder). In order to do this I need HtmlNode.WriteTo to be virtual, so I can override it.
I made it virtual for myself and compiled a custom version of HAP. But it'd be nice if this change is done to official HAP, so we can use it from NuGet.

Nov 18, 2013 at 3:58 PM
Edited Nov 18, 2013 at 3:59 PM
Did have any clue how to reduce usage of memory? i think it is major feedback of HAP for crawl on multiple threads on a scale.

been given a right direct for a quickly start, i will plan to make such perf improvement.
Nov 18, 2013 at 11:43 PM
Honestly I have no idea. I haven't seen any memory issue with HAP in our use case. But we don't use it for anything as intense as crawling. Our CMS may have max 100 concurrent users and wee parse relatively short HTML (news articles).
Basically there should be no memory leak in HAP as it's pure C# code. It does not mean that there is no excessive memory usage.