Bug in XPath engine

Topics: Developer Forum, Project Management Forum, User Forum
Feb 7, 2010 at 2:58 PM

I just found a bug in the XPath engine.

I want to extract all input elements from a specific html form.

doc.SelectNodes("//form[@name='name']//input");

this returns null. If i load a well formated test page into a XmlDocument and uses the query above it works perfect, if I load the same well formated test page into a HtmlDocument it return null

I can not use a XmlDocument because the page im going to load does not load into a XmlDocumnet.

Is there a work around for this?

Feb 8, 2010 at 11:34 AM
Edited Feb 9, 2010 at 11:49 AM

Interesting... can you post sample text please.

 

Feb 8, 2010 at 1:17 PM
Edited Feb 8, 2010 at 1:17 PM

Hi.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(@"<body><form id=""name"">
    <span>
        <input id=""test""></input>
    </span>
</form>
<form id=""test"">
    <input id=""test2""></input>
</form></body>");

            HtmlNodeCollection n = doc.DocumentNode.SelectNodes("//form[@id='name']//input");

I fixed it by removing this line in the HtmlNode Class

ElementsFlags.Add("form", HtmlElementFlag.CanOverlap | HtmlElementFlag.Empty);

 

 

Feb 8, 2010 at 6:18 PM

Are those double quotes in there on purpose?

Feb 9, 2010 at 7:55 AM

Yes, but they will be interpreted as single quotes by the C# compiler.

 

Feb 12, 2010 at 1:43 AM

The XPath engine is okay.  Your expression is wrong.  It should be "//form//input" to select any input element that is a descendant of the form element.

In you HTML document, there is no form with the attribute name!  Maybe you meant "//form[@id='name']//input"?