Some code modification needed in HtmlToText.ConvertTo

Topics: Developer Forum, User Forum
Nov 26, 2010 at 7:04 AM
I have noticed that the ConvertTo(HtmlNode node, TextWriter outText) method in the HtmlToText class is showing the items in a list as a single word.
So i have modified the code a bit and used it, it worked fine for me. I am posting the code that i have modified.
public void ConvertTo(HtmlNode node, TextWriter outText)
        {
            string html;
            switch (node.NodeType)
            {
                case HtmlNodeType.Comment:
                    // don't output comments
                    break;

                case HtmlNodeType.Document:
                    ConvertContentTo(node, outText);
                    break;

                case HtmlNodeType.Text:
                    // script and style must not be output
                    string parentName = node.ParentNode.Name;
                    if ((parentName == "script") || (parentName == "style"))
                        break;

                    // get text
                    html = ((HtmlTextNode)node).Text;

                    // is it in fact a special closing node output as text?
                    if (HtmlNode.IsOverlappedClosingElement(html))
                        break;

                    // check the text is meaningful and not a bunch of whitespaces
                    if (html.Trim().Length > 0)
                    {
                        outText.Write(HtmlEntity.DeEntitize(html));
                    }
                    break;

                case HtmlNodeType.Element:
                    switch (node.Name)
                    {
                        case "p":
                            // treat paragraphs as crlf
                            outText.Write("\r\n");
                            break;
                            //
                            //code that i added
                            //
                        case "div":
                            // treat div as white space
                            outText.Write(" ");
                            break;
                        case "br":
                            // treat break as crlf
                            outText.Write("\r\n");
                            break;
                        case "li":
                            // treat list items as crlf
                            outText.Write("\r\n");
                            break;
                            //
                            //My addition end here
                            //
                    }

                    if (node.HasChildNodes)
                    {
                        ConvertContentTo(node, outText);
                    }
                    break;
            }
        }

So if the modofication(even though a small one) helped someone, please let me know
If you think i have done anything wrong let me know
Thank you