Bug in HtmlDocument.Save(outStream as System.IO.Stream)?

Topics: Developer Forum
Sep 6, 2006 at 7:32 PM
Hi, Simon.

I've got the following code (notated a bit for you). The problem is I can't seem to get the output stream to get the entire output of a transformed HtmlDocument. It gets cut off about 500 chrs or so from the end of the stream.

Any ideas? I've tried changing options on the HtmlDocument, as well as messing with different types of Stream objects, and declaring the capacity of the MemoryStreams, and specifying the encoding, with no luck. This code doesn't look like it will print very well, but hopefully the format will be okay when it posts.

Thank you very much for the awesome tool, and for any help you can provide. Please feel free to email me if you'd like me to email a zip of the code. iamthepants [at] g m a i l

Thanks!
STA

Public Shared Function getFeedStream(ByVal FeedURL As String, ByVal FeedXSLT As String, ByVal FeedUser As String, ByVal FeedPwd As String, Optional ByVal ApplyXHTMLConverter As Boolean = False, Optional ByVal isDebug As Boolean = False) As Stream

Dim oWorkStream As New MemoryStream
Dim oOutputStream As New MemoryStream
oWorkStream = getPageContent(FeedURL)
' getPageContent returns a memorystream filled with the httpwebresponse

If ApplyXHTMLConverter Then
Dim oXFormedDoc As New HtmlDocument
oXFormedDoc.OptionOutputAsXml = True
oXFormedDoc.OptionWriteEmptyNodes = False
Try
oXFormedDoc.Load(oWorkStream)
' This Save works fine: output is exactly as expected
oXFormedDoc.Save("C:\External\FeedApps\PFeedV2\DATA\LOGS\DirectOutput.xml")
' This Save is short by 500-1000 chrs (varies depending on page/HTMLDocument options)
oXFormedDoc.Save(oOutputStream)
Catch ex As Exception
Call AppendToGeneralErrorLog(Date.Now.ToShortTimeString & ": Error Line 194 (oXFormedDoc.Load) " & ex.Message)
End Try
oOutputStream.Position = 0
End If
If isDebug Then
Dim oSR As StreamReader = New StreamReader(oOutputStream)
Dim oSW As StreamWriter = System.IO.File.CreateText(ConfigurationManager.AppSettings("LogFileDirectory") & "\feeddebug.txt")
oSW.Write(oSR.ReadToEnd())
oSW.Flush()
'oSW.Close()
oOutputStream.Position = 0
End If
Dim oFeedDoc As XPath.XPathDocument
If Not FeedXSLT = vbNullString Then
Dim oXSLT As XslCompiledTransform = New XslCompiledTransform
Try
Dim oFeedXSLT As XmlReader = XmlReader.Create(New StringReader(FeedXSLT))
oXSLT.Load(oFeedXSLT)
oFeedDoc = New XPath.XPathDocument(oOutputStream)
oWorkStream = New MemoryStream
oXSLT.Transform(oFeedDoc, Nothing, oOutputStream)
oOutputStream.Position = 0
Catch ex As Exception
' Result = "Error transforming responseStream with xslt"
End Try
End If
Return oOutputStream
End Function

Coordinator
Sep 7, 2006 at 8:09 PM
Hi,

If you look at the code, HtmlDocument.Save(Stream) is just a wrapper. All Save methods end up calling DocumentNode.WriteTo.

Have you tried using Flush? Are you sure the input HTTP stream is ok? Do you get the same problems if you replace HTTP streams by file streams?

Just ideas on top of my head :-)
Simon.
Sep 7, 2006 at 11:36 PM
Yes, I'm sure the HTTP input stream is fine, because if I output from HtmlDocument using oXFormedDoc.Save("C:\External\FeedApps\PFeedV2\DATA\LOGS\DirectOutput.xml") the output is fine. It's only when I output it as a stream that it spits out incomplete xhtml.

I just tried writing directly to a string using
sFoo = oXFormedDoc.DocumentNode.WriteContentTo()
and that worked fine, too. I'll try writing that string into a new memorystream and then write it to file the same way as I'm doing now to try to rule out the agility pack as the source of this issue.

Thanks again,
STA
Sep 7, 2006 at 11:58 PM
Hmmm... must be something in your code.

When I do this (write to string, then write that string back into my oOutputStream object), my outputstream object has all the data I'm looking for. So oXFormedDoc.Save(oOutputStream) apparently isn't flushing? Anyway, I'll either do what works below, or I'll modify your source code to try to fix this.

If ApplyXHTMLConverter Then
Dim oXFormedDoc As New HtmlDocument
oXFormedDoc.OptionOutputAsXml = True
oXFormedDoc.OptionWriteEmptyNodes = False
Try
Dim sFoo As String
oXFormedDoc.Load(oWorkStream)
sFoo = oXFormedDoc.DocumentNode.WriteTo()
oOutputStream = New MemoryStream(Encoding.UTF8.GetBytes(sFoo))
Catch ex As Exception
Call AppendToGeneralErrorLog(Date.Now.ToShortTimeString & ": Error Line 194 (oXFormedDoc.Load) " & ex.Message)
End Try
oOutputStream.Position = 0
End If


Jan 4, 2007 at 9:55 AM
Hi

I can confirm that there is a bug in the Save(Stream) method. In fact it is in the Save(TextWriter writer). The current implementation is

public void Save(TextWriter writer)
{
if (writer == null)
{
throw new ArgumentNullException("writer");
}
DocumentNode.WriteTo(writer);
}

The correct implementation should be

public void Save(TextWriter writer)
{
if (writer == null)
{
throw new ArgumentNullException("writer");
}
DocumentNode.WriteTo(writer);
writer.Flush();
}

This is how it's done in the Save(XmlWriter writer)

Regards
Rodion
Mar 27, 2007 at 6:16 PM

rodion wrote:
Hi

I can confirm that there is a bug in the Save(Stream) method. In fact it is in the Save(TextWriter writer). The current implementation is

public void Save(TextWriter writer)
{
if (writer == null)
{
throw new ArgumentNullException("writer");
}
DocumentNode.WriteTo(writer);
}

The correct implementation should be

public void Save(TextWriter writer)
{
if (writer == null)
{
throw new ArgumentNullException("writer");
}
DocumentNode.WriteTo(writer);
writer.Flush();
}

This is how it's done in the Save(XmlWriter writer)

Regards
Rodion


I was having the same problems. I made the modifications to the HtmlDocument.cs file that rodion suggested (adding writer.Flush()) and it worked perfectly. Thanks rodion!
Aug 26, 2010 at 7:42 PM

I had exactly the same issue. I applied Rodion's fix and it work perfectly now. Thanks Rodion!

Would it be possible to include this fix in the release?