This project has moved and is read-only. For the latest updates, please go here.

Data In Malformed Tags

Topics: User Forum
Sep 3, 2006 at 4:48 AM
I am working with files that have both text and html components to them. I am using the HTML Agility pack to basically convert the html components to text. However, I want to save the href/img attribute in the text as well.

In the files I am running across malformed tags with links in them like
<http://linkwithoutatag> and I want to capture the data in the tag if it contains http. What is the best way to capture this data. When this happens the node does not seem to have any innertext / innerhtml etc?

Sep 4, 2006 at 9:25 AM

<http://blabla> is a really badly formed html :-)
Although the Html Agility Pack does not break or choke on this, it is not equipped to really use/program/read it.

One solution could be to run a regex before running the Html parser, and it's quite easy for this type of strange tag.

Hope this helps :P