Doctypes and tags

Sep 9, 2011 at 3:39 PM

Hello all, this is my first post here. We just recently started using Html Agility.Hopefully I can get some advice. =)

I am extracting html from a html editor on a website and injecting some code into it before saving into Azure storage where the user can then view the site they created. The reason I need to inject some code is because we are giving the user the ability to put a place holder in the page they create which indicates where they want a web form. When the user saves we parse the html for the place holder tag, if there is one we get the html for that form which includes everything it needs to function and the css to make it look good. Thats where part of the problem comes in...

When we look at just the form, it looks fine. When it's inserted into the page newly created page the form looks pretty ugly. We took a look at the css in IEs debugger and the exact same css is quite different from the original form to the page with the form inserted. Both point to the same exact css which should be exactly the same. Further investigation we find things like the doctype is missing and body tag is missing a class etc. When I manually add these back into the page everything works fine.

I've simplified the below code a bit to take out some items unrelated to the problem.

	if (there is a placeholder)
	{
		// Set up the needed information the insert the webform into the landing page
		formId = matches[0].Groups[1].Value;
		var repository = new SimpleRepository(AzureUtil.GetSubSonicDataProvider(), SimpleRepositoryOptions.None);
		var clientConnection = Manager.ValidateInstance(this.ClientConnection.Customer.CustomerId, this.ClientConnection.Configuration.ConnectionId);
		var webForm = clientConnection.ValidateWebForm(repository, formId.ToGuid(), true);
		var formHtml = clientConnection.GenerateWebFormHtml(webForm, WebFormHtmlType.Full);

		// Get the form HTML
		var formHtmlDocument = new HtmlDocument();
		formHtmlDocument.LoadHtml(formHtml);
		var formHtmlNode = formHtmlDocument.DocumentNode.SelectSingleNode("/html");
		if (formHtmlNode == null)
		{
			formHtmlNode = formHtmlDocument.CreateElement("html");
			formHtmlDocument.DocumentNode.AppendChild(formHtmlNode);
		}
		var formHeadNode = formHtmlNode.SelectSingleNode("head");
		if (formHeadNode == null)
		{
			formHeadNode = formHtmlDocument.CreateElement("head");
			formHtmlNode.AppendChild(formHeadNode);
		}
		var formBodyNode = formHtmlNode.SelectSingleNode("body");
		if (formBodyNode == null)
		{
			formBodyNode = formHtmlDocument.CreateElement("body");
			formHtmlNode.AppendChild(formBodyNode);
		}

		// Copy nodes into the resulting HTML (such as stylesheets, javascript files, etc.)
		if (!formHeadNode.ChildNodes.IsNullOrEmpty())
		{
			foreach (HtmlNode sourceHeadChildNode in formHeadNode.ChildNodes.Where(x => x.NodeType == HtmlNodeType.Element))
			{
				if (sourceHeadChildNode.Name != "title")
				{
					try
					{
						var destinationNode = PageHtmlDocument.CreateElement(sourceHeadChildNode.Name);
						if (sourceHeadChildNode.HasAttributes)
						{
							foreach (var attribute in sourceHeadChildNode.Attributes)
							{
								destinationNode.SetAttributeValue(attribute.Name, attribute.Value);
							}
						}
						if (sourceHeadChildNode.InnerText.IsNotEmpty())
						{
							destinationNode.InnerHtml = sourceHeadChildNode.InnerText;
						}
						PageHeadNode.AppendChild(destinationNode);
					}
					catch { }
				}
			}
		}

		// Replace the placeholder with the appropriate HTML
		var PageHtml = PageHtmlNode.OuterHtml;
		blobHtmlContent = PageHtml.Replace(matches[0].Value, formBodyNode.InnerHtml);
	}

 

This works perfectly except for the doctype, html tag and body tag. The doctype is dropped completely, the html and body tag lose anything that was included in the tag and so to fix the issue I have to add in these lines.

blobHtmlContent = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n" + blobHtmlContent;
blobHtmlContent = blobHtmlContent.Replace("<html>", "<html xmlns=\"http://www.w3.org/1999/xhtml\">");
blobHtmlContent = blobHtmlContent.Replace("<body>", "<body class=\"noI\">");
There has to be a better way that I'm unaware of because of my inexperience.