Expanding HtmlConvert.cs to handle List Items

Topics: Developer Forum
Jul 12, 2012 at 12:49 PM

Hello there

We've tried using the excellent HtmlToTxt sample:

http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/98677#1010173

Whilst it converts text great, removed the CSS etc, it fails to handle <li> tags, specifically, they all apear on 1 line, how would one adapt HtmlConvert.cs to replace <li></li> with a linebreak?

This would also need to handle a missing </li>.

Any ideas?

Many thanks

Regards

Jordon

Apr 28 at 7:53 PM
Edited Apr 28 at 7:54 PM
This works however all the lists are numerical. If you can figure out how to have the style change (decimal then alpha then roman...) for sub-ordered lists let me know.

Add the following to: internal static void ConvertTo(HtmlNode node, TextWriter outText, PreceedingDomTextInfo textInfo)
case "li":
                                if (textInfo.ListIndex > 0)
                                {
                                    outText.Write("\r\n\t{0}.", textInfo.ListIndex++);
                                }
                                else
                                {
                                    outText.Write("\r\n\t•"); //using '*' as bullet char, with tab after, but whatever you want eg "\t->", if utf-8 0x2022
                                }
                                isInline = false;
                                break;
                            case "ol":
                                listIndex = 1;
                                goto case "ul";
                            case "ul": //not handling nested lists any differently at this stage - that is getting close to rendering problems
                                endElementString = "\r\n";
                                isInline = false;
                                break;