I'd be interested in seeing the patch and getting some performance metrics. The memory footprint reduction will be wildly variable, since HAP keeps a reference to the original document and does substrings off of it to get the tag name, inner text, inner
html. Comments are handled a little differently where the comment is copied. So it might help if you had a document with a large number of comments.
As for the parsing speed that might not see too much improvement since it still needs to keep parsing until it finds the end of the comment. Though the parser is pretty complex I've found it to be pretty darn efficient. Every time I think I found a way to
"fix" something that looks ugly I find that it is ugly because it is avoiding costly operations.
With that said I'm always open to other ideas.