Inconsistent parsing of whitespace within tags

Topics: User Forum
Feb 3, 2011 at 2:04 PM

I'm working on a HTML editor and using HAP to parse the HTML. Bearing in mind that I'm parsing while a user is typing, I have come across the following issue(s):

  • When parsing tags, if the tag contains spaces, the spaces are stripped, e.g. "<div >" is stripped to "<div>"
  • When parsing self-closing tags, e.g. <br/> a space is added before the slash like so: <br />

This is causing me no end of issues because I need the exact Length of the InnerHtml of a HtmlNode as the user has typed it, not as HAP has parsed it.

So ultimately, my question is how on Earth do I stop HAP from mutating my HTML during parsing? I'm not using HAP for output.

Thanks in advance,

A.

Feb 7, 2011 at 2:18 PM

This is also the reason I started using HAP. I was writing a VS2010 extension and this "issue" stopped me from completing it. The thing is it's not the parsing at all. When calling InnerHtml the first time it is dynamically creating the HTML based on the object model and not the original text document. Thus it's writing proper HTML.

There are some Internal fields (_innerstartindex and _innerlength) that do track the start/end positions of a node and HtmlDocument has a Text internal property that has the original document string. You can access these with reflection or modify the source code to give you access to them. 

In a future release I will probably make these public readonly properties so it is easier to get to this information.

Feb 18, 2011 at 12:51 AM

Thanks for the response. Any ETA on the next release?

Feb 18, 2011 at 12:37 PM

not at the moment, all the time I set aside for getting to work on 2.0 has been eaten up by large projects (I've working on a site millions of people use).

I'm hoping sometime this year I can get 2.0 up and rolling.