This project has moved and is read-only. For the latest updates, please go here.
1
Vote

Tag mixing *generates* invalid html

description

Try this program:
using HtmlAgilityPack;
using System;

class Program
{
  const string test = @"
<html>
<body>
<span>
<p>Foo</span></p>
<p>Bar</p>
</body></html>";
  static void Main(string[] args)
  {
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(test);
    Console.WriteLine(doc.DocumentNode.OuterHtml);
    Console.ReadLine();
  }
}
It outputs the following html:

<html>
<body>
<span>
<p>Foo</span><p>
<p>Bar</p>
</body></html>
Note that the first p endtag has suddenly become a start tag, i.e. HtmlAgilityPack is somehow generating malformed html in this case.

comments

poizan42 wrote Feb 12, 2016 at 3:24 PM

Note that the generated dom in Chrome for this is:
<html><head></head><body>
<span>
<p>Foo</p>
<p>Bar</p>

</span></body></html>
Also note that if we use font instead of span then Chrome sees it as:
<html><head></head><body>
<font>
</font><p><font>Foo</font></p>
<p>Bar</p>

</body></html>

Tyf0x wrote Mar 29, 2016 at 2:44 AM

Just wanted to point out that the html in the input string is itself incorrect. Doesn't change the fact that the output is incorrect though.