A Parse Bug ?

Topics: User Forum
Jun 26, 2010 at 7:10 AM

I use Html Agility Pack for parse a html page, that's a asp.net webform project. the page html code

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><title>
 
</title><link href="Styles/Site.css" rel="stylesheet" type="text/css" />
</head>
<body>
    <form method="post" action="G1.aspx" id="ctl01">
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKMTY1NDU2MTA1MmRkNjIzt9U3S0Qq5Vae9T2dSOGuayLmn1JWS6o31dNezFA=" />
</div>
 
 
<script src="/WebSite1/WebResource.axd?d=xfbBYE4wQsueirpLsCWgxQVrmhLNJczvEu8-H_BOwlQ1&amp;t=634103057810714000" type="text/javascript"></script>
    <div class="page">
        <div class="header">
            <div class="title">
                <h1>
                    我的 ASP.NET 应用程序
                </h1>
            </div>
            <div class="loginDisplay">
                
                        [ <a href="Account/Login.aspx" id="HeadLoginView_HeadLoginStatus">登录</a> ]
                    
            </div>
            <div class="clear hideSkiplink">
                <a href="#NavigationMenu_SkipLink"><img alt="跳过导航链接" src="/WebSite1/WebResource.axd?d=_V9kr3rfsoIxNEQid9kj6A2&amp;t=634103057810714000" width="0" height="0" style="border-width:0px;" /></a><div class="menu" id="NavigationMenu">
	<ul class="level1">
		<li><a class="level1" href="Default.aspx">主页</a></li><li><a class="level1" href="About.aspx">关于</a></li>
	</ul>
</div><a id="NavigationMenu_SkipLink"></a>
            </div>
        </div>
        <div class="main">
            
 
 
        </div>
        <div class="clear">
        </div>
    </div>
    <div class="footer">
        ssss
    </div>
    
<script type='text/javascript'>new Sys.WebForms.Menu({ element: 'NavigationMenu', disappearAfter: 500, orientation: 'horizontal', tabIndex: 0, disabled: false });</script></form>
</body>
</html>


I make a generate xpath xml with jquery. It's like HAPExplorer.

$(document).ready(function () {
    var doc = $(document);
    doc.bind("mouseover", function (e) {
        $(e.target).toggleClass("beselect");
    });
    doc.bind("mouseout", function (e) {
        $(e.target).toggleClass("beselect");
    });
    doc.bind("click", function (e) {
        var src = e.target,
            path = getNodeXpath(src);
        alert(path);
    });
});

function getNodeXpath(node) {
    var parent = node.parentNode;
    if (parent == document)
        return "/html[1]";
    var pchildren = parent.children,
        nodeName = node.nodeName,
        i = 1, j = 0, plen = pchildren.length;
    for (; j < plen; j++) {
        if (pchildren[j].nodeName != nodeName) continue;
        if (pchildren[j] == node) break;
        i++;
    }
    return getNodeXpath(parent) + "/" + nodeName + "[" + i + "]";
}

The problem is front generate xpath is not match Html Agility Pack's xpath. I discover HAP parse the form node wrong.what can I do for fix this?

Jul 3, 2010 at 8:25 AM

I find the anwser at http://htmlagilitypack.codeplex.com/Thread/View.aspx?ThreadId=17922

thx HAP.