This project has moved and is read-only. For the latest updates, please go here.
1
Vote

Attributes.Remove on Image Only Removes One, When There Are Two

description

I'm using HtmlAgilityPack in our project, so that I can display some Html from another of our systems. I ran across this issue in my unit testing, and posted to StackOverflow to verify that this issue was indeed a bug. Another developer there was able to verify it for me.

Basically, when I call Attributes.Remove, it should remove all instances of that attribute, if there are more than one, but it's only removing one.

If I have an image, and it has 2 "src" values, I'd like to pick one, remove them both, and add one back in with the right path.

So, here's an example image tag:
<img align=\"left\" alt=\"\" src=\"/blah.jpg\" src=\"/knowledge/blah.jpg\" border=\"0\" />
Here's the code to manipulate the Html:
    public static string FixHtmlLinks(this string html)
    {
        var htmlDoc = new HtmlDocument()
        {
            OptionWriteEmptyNodes = true
        };
        htmlDoc.LoadHtml(html);

        var imagesToCheck = htmlDoc.DocumentNode.SelectNodes("//img[@src!='']");

        if (null != imagesToCheck)
        {
            foreach (var image in imagesToCheck.ToList())
            {
                var src = image.GetAttributeValue("src", string.Empty);
                if (Uri.IsWellFormedUriString(src, UriKind.Relative))
                {
                    image.Attributes.Remove("src");
                    image.SetAttributeValue("src", string.Format(RELATIVE_IMAGE_PROTOCOL_AND_HOST, src));
                }
                else if (Uri.IsWellFormedUriString(src, UriKind.Absolute))
                {
                    image.Attributes.Remove("src");
                    image.SetAttributeValue("src", src.Replace(ABSOLUTE_IMAGE_HOST_TO_REPLACE, IMAGE_PROTOCOL_AND_HOST));
                }
            }
        }

        return htmlDoc.DocumentNode.OuterHtml;
    }
When I debug, and it gets to the line "image.Attributes.Remove("src");", there are 2 "src" values, as expected. After that line runs, there is 1 "src" value there, the one that starts with "/knowledge". However, I would expect them both to be removed, since the summary for Remove says:
Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
I checked the source code for the HtmlAttributeCollection, and the Remove method puts it through a loop to remove the values, so everything looks like it should work, but it doesn't.

I'm reporting this, but I hope to get a chance soon to find the cause, and maybe submit a fix.

comments