Hi everyone (yes, I'm still alive :)
bigpilot, the Html Agility Pack *is* cleaning the HTML you gave.
But... by default, it's tailored for HTML 3.x, and in HTML 3.x, you *don't always have* to close tags. It means a <p> alone is perfectly valid, so it's automatically closed, because there is no corresponding </p> found. If you try the same HTML
in a browser, you will see that browser behave exactly like this (unless you set DOCTYPES to more strict parsing).
So the parsed tree is like this:
Here, the <select> is not a child of <p> but the next sibling. You can get the <select> with this xpath: //div[@name='locations']/select or what is suggested by kurtnelle.
Now, you can tweak the HTML agility pack to better suit what you expect using the HtmlNode.ElementFlags static property (please search for this in this forum for more information, or have a look into HtmlNode.cs). What you can do is tell it you don't want
to support unclosed <p> tags:
HtmlNode.ElementsFlags.Remove("p"); // remove the Empty and Closed flags
HtmlDocument doc = new HtmlDocument();
And bingo, the pack has closed the malformed <p> because it's not valid anymore, and your original xpath works, because now the parsed tree is: