This project has moved and is read-only. For the latest updates, please go here.
2
Vote

Incorrect Parsing of Malformed Html

description

The following Html causes it not to parse the body.

Input Html:
<html>
<head><script type="text/javascript" src="xss.js" /></head><body><script>alert('XSS')</script></body></html>

DocumentNode.DecendantsAndSelf():
  • [0] Name: "#document"} HtmlAgilityPack.HtmlNode
  • [1] Name: "html"} HtmlAgilityPack.HtmlNode
  • [2] Name: "#text"} HtmlAgilityPack.HtmlNode {HtmlAgilityPack.HtmlTextNode}
  • [3] Name: "head"} HtmlAgilityPack.HtmlNode
  • [4] Name: "script"} HtmlAgilityPack.HtmlNode
  • [5] Name: "#text"} HtmlAgilityPack.HtmlNode {HtmlAgilityPack.HtmlTextNode}
As you can see the body is missing from the descendants as the script isn't closed. But a lot of sites do use <script /> as a closed script tag.

comments