Li element innertext is not comming properly

Topics: Developer Forum, User Forum
Jul 14, 2011 at 4:45 AM

IF the HTML Source contain's below text
<ul>
<li>Testing</li>
<li>Testing1</li>
<li>Testing2</li>
<li>Testing3</li>
</ul>

HtmlAgilityPack.HtmlDocument agDoc = new HtmlAgilityPack.HtmlDocument();
agDoc.LoadHtml(source);
IEnumerable<HtmlAgilityPack.HtmlNod

e> q = docNode.DescendantNodes();
IEnumerable<HtmlAgilityPack.HtmlNode> elements = q.Where<HtmlAgilityPack.HtmlNode>(p => p.OriginalName == "li");
List<HtmlAgilityPack.HtmlNode> allElementLst = elements.ToList<HtmlAgilityPack.HtmlNode>();


In Above code we will get the 4 HtmlNode with the "li" as OriginalName .

When we see the innertext for the
first Node: "Testing \r\nTesting1 \r\nTesting2 \r\nTesting3 "
second Node: "Testing1 \r\nTesting2 \r\nTesting3 "
third Node : "Testing2 \r\nTesting3 "
fourth Node: "Testing3 "

But Expected result should be like this

first Node: "Testing"
second Node: "Testing1"
third Node : "Testing2"
fourth Node: "Testing3 "

Can you please help me by solving this problem.

Thanks & Regard's

Sai ...
Jul 14, 2011 at 2:49 PM
Edited Jul 14, 2011 at 2:50 PM

I used LINQpad to test this with a ref to the agility pack. 

 

var html=@"<ul>
<li>Testing</li>
<li>Testing1</li>
<li>Testing2</li>
<li>Testing3</li>
</ul>";

HtmlAgilityPack.HtmlDocument doc=new HtmlDocument();
HtmlAgilityPack.HtmlDocument agDoc = new HtmlAgilityPack.HtmlDocument();
agDoc.LoadHtml(html);
IEnumerable<HtmlNode> q = agDoc.DocumentNode.DescendantNodes();
IEnumerable<HtmlNode> elements = q.Where(p => p.OriginalName == "li");
List<HtmlNode> allElementLst = elements.ToList<HtmlNode>();
var result=allElementLst.Select (c => c.InnerText);
result.Dump();//specific to LINQPad. Remove for non LINQPad compilation