There are at least 2 a bug in the library(example inside)

Topics: Developer Forum, Project Management Forum, User Forum
Jan 17, 2010 at 9:16 PM

Hello,

Maybe i'm wrong but i believe there are 2 bug in the library. This is how i discovered them.

I need a simple method that clears an html text from white spaces and comments. So i have created this code:

 

List<HtmlNode> _nodiToRemoteve = new List<HtmlNode>();

        /// <summary>
        /// ...
        /// </summary>
        /// <param name="doc"></param>
        public void RemoveWhAndComments(HtmlDocument doc)
        {
            for (int i = 0; i < doc.DocumentNode.ChildNodes.Count; i++)
            {
                HtmlNode docDocumentNodeChildNodes = doc.DocumentNode.ChildNodes[i];
                RemoveWhAndCommentsFromNodes(docDocumentNodeChildNodes);
            }

            for (int i = 0; i < _nodiToRemoteve.Count; i++)
                _nodiToRemoteve[i].Remove();
            _nodiToRemoteve.Clear();
        }

        /// <summary>
        ///...
        /// </summary>
        /// <param name="nodoCorrente"></param>
        public void RemoveWhAndCommentsFromNodes(HtmlNode nodoCorrente)
        {
            if (
                    nodoCorrente.NodeType == HtmlNodeType.Text ||
                    nodoCorrente.NodeType == HtmlNodeType.Comment
                    )
            {
                //nodoCorrente.Remove();
               //System.Diagnostics.Trace.WriteLine(nodoCorrente.InnerHtml);
                _nodiToRemoteve.Add(nodoCorrente);
                return;
            }

            for (int i = 0; i < nodoCorrente.ChildNodes.Count; i++)
            {
                HtmlNode docDocumentNodeChildNodes = nodoCorrente.ChildNodes[i];
                if (
                    nodoCorrente.NodeType == HtmlNodeType.Text ||
                    nodoCorrente.NodeType == HtmlNodeType.Comment
                    )
                {
                    //docDocumentNodeChildNodes.Remove();
                    _nodiToRemoteve.Add(docDocumentNodeChildNodes);
                    continue;
                }
                RemoveWhAndCommentsFromNodes(docDocumentNodeChildNodes);
            }
        }

 

The first bug is that if i use _nodiToRemoteve[i].RemoveAll(); the library goes in stackoverflowexception, this one i have solved using the Remove method instead of RemoveAll.

The other bug instead is that the parser classify this tag as comment:

<style type="text/css">
body {


        background-image: url(./images/casper_black.png);
     background-color: black;
        background-position: center;
        background-repeat: no-repeat;
        background-attachment: fixed;

}

a {
    color:white;
    position: fixed; right: 525px; bottom: 35px;z-index:9
}

</style>

 

now its clear that this is not a comment!

What do you think about this two problems?

 

thank you

 

Jan 29, 2010 at 12:27 PM

Humm...

Can you tell me why you need to remove the comments and whitespaces from the html?