Problem parsing children

Topics: Developer Forum, User Forum
Jun 23, 2010 at 2:18 AM
Edited Jun 23, 2010 at 2:19 AM

Hi All,

 I'm having a problem parsing the input tag children of a form in html.  I can parse them from the root using //input[@type] but not as children of a specific node. 

 Here's some code that illustrates the problem:

 private const string HTML_CONTENT =
            "<html>" +
            "<head>" +
            "<title>Test Page</title>" +
            "<link href='site.css' rel='stylesheet' type='text/css' />" +
            "</head>" +
            "<body>" +
            "<form id='form1' method='post' action='http://www.someplace.com/input'>" +
            "<input type='hidden' name='id' value='test' />" +
            "<input type='text' name='something' value='something' />" +
            "</form>" +
            "<a href='http://www.someplace.com'>Someplace</a>" +
            "<a href='http://www.someplace.com/other'><img src='http://www.someplace.com/image.jpg' alt='Someplace Image'/></a>" +
            "<form id='form2' method='post' action='/something/to/do'>" +
            "<input type='text' name='secondForm' value='this should be in the second form' />" +
            "</form>" +
            "</body>" +
            "</html>";

public void Parser_Test()
        {
            var htmlDoc = new HtmlDocument
            {
                OptionFixNestedTags = true,
                OptionUseIdAttribute = true,
                OptionAutoCloseOnEnd = true,
                OptionAddDebuggingAttributes = true
            };

            byte[] byteArray = Encoding.UTF8.GetBytes(HTML_CONTENT);
            var stream = new MemoryStream(byteArray);
            htmlDoc.Load(stream, Encoding.UTF8, true);
            var nodeCollection = htmlDoc.DocumentNode.SelectNodes("//form");
            if (nodeCollection != null && nodeCollection.Count > 0)
            {
                foreach (var form in nodeCollection)
                {
                    var id = form.GetAttributeValue("id", string.Empty);
                    if (!form.HasChildNodes)
                        Debug.WriteLine(string.Format("Form {0} has no children", id ) );

                    var childCollection = form.SelectNodes("input[@type]");
                    if (childCollection != null && childCollection.Count > 0)
                    {
                        Debug.WriteLine("Got some child nodes");
                    }
                    else
                    {
                        Debug.WriteLine("Unable to find input nodes as children of Form");
                    }
                }
                var inputNodes = htmlDoc.DocumentNode.SelectNodes("//input");
                if (inputNodes != null && inputNodes.Count > 0)
                {
                    Debug.WriteLine(string.Format("Found {0} input nodes when parsed from root", inputNodes.Count ) );
                }
            }
            else
            {
                Debug.WriteLine("Found no forms");
            }
        }

 What is output is:

 

Form form1 has no children
Unable to find input nodes as children of Form
Form form2 has no children
Unable to find input nodes as children of Form
Found 3 input nodes when parsed from root

 

 

 

What I would expect is that Form1 and Form2 would both have children and that input[@type] would be able to find 2 nodes for form1 and 1 for form2

Is there a specific configuration setting or method that I'm not using that I should be?  Any ideas?

Thanks,

Steve