PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

PdfDocument memory leaking
https://forum.pdfsharp.net/viewtopic.php?f=2&t=2177
Page 1 of 1

Author:  rkawano [ Wed Oct 24, 2012 8:18 pm ]
Post subject:  PdfDocument memory leaking

I am using this method to get total pages of a PDF file:

Code:
public static Int32 CountPages(String filename)
{
    using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
    {
        return inputDocument.PageCount;
    }
}


The "InformationOnly" parameter works fine. This is the first free library I have tested that can count pages of large PDF files (>300MB).

But when I run my application, the memory increases at first line of the method and don't down after the using, and in few seconds my application throws a OutOfMemoryException (on another part of my app).

So I looked for dispose method on PdfDocument and I get it:

Code:
public void Dispose()
{
  Dispose(true);
  //GC.SuppressFinalize(this);
}
void Dispose(bool disposing)
{
  if (this.state != DocumentState.Disposed)
  {
    if (disposing)
    {
      // Dispose managed resources.
    }
    //PdfDocument.Gob.DetatchDocument(Handle);
  }
  this.state = DocumentState.Disposed;
}
(PdfDocument.cs lines 151..168)

Its appears that is not disposing anything. So I've debugged the Open() method on PdfReader class and see memory increasing at this loop:

Code:
// Read all indirect objects
for (int idx = 0; idx < count; idx++)
{
    PdfReference iref = irefs[idx];
    if (iref.Value == null)
    {
        try
        {
            Debug.Assert(document.irefTable.Contains(iref.ObjectID));
            PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false);
            Debug.Assert(pdfObject.Reference == iref);
            pdfObject.Reference = iref;
            Debug.Assert(pdfObject.Reference.Value != null, "something got wrong");
        }
        catch (Exception ex)
        {
            Debug.WriteLine(ex.Message);
        }
    }
    else
    {
        Debug.Assert(document.irefTable.Contains(iref.ObjectID));
        iref.GetType();
    }
    // Set maximum object number
    document.irefTable.maxObjectNumber = Math.Max(document.irefTable.maxObjectNumber, iref.ObjectNumber);
}
(PdfReader.cs lines 346..372)

It is not clear to me which object is retaining data in memory.

Do anyone knows how to correctly dispose PdfDocument?

Author:  rkawano [ Thu Nov 08, 2012 1:47 pm ]
Post subject:  Re: PdfDocument memory leaking

After some days trying to figure out this problem I found a workaround for our application.

First, I need to change the PDFDocument class to set all private members to null on Dispose and recompile the library:

Code:
void Dispose(bool disposing)
{
    if (this.state != DocumentState.Disposed)
    {
        if (disposing)
        {
            // Dispose managed resources.
            this.info = null;
            this.pages = null;
            this.fontTable = null;
            this.catalog = null;
            this.trailer = null;
            this.iref = null;
            this.irefTable = null;
        }
        //PdfDocument.Gob.DetatchDocument(Handle);
    }
    this.state = DocumentState.Disposed;
}


And according to Thomas Hoevel comment on this post, I need to call GC after the reading operation:

Code:
public static Int32 CountPages(String filename)
{
    try
    {
        using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
        {
            return inputDocument.PageCount;
        }
    }
    finally
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}


It is a workaround and not a definitive fix, I have noted, in some cases, that the memory are not freely after calling the GC collector, but my application can "survive" running without a memory exceptions while reading a sequence of large PDf files (tested a sequence of 10 files with 300MB each). If I don't change the dispose method or don't call the GC, then the memory exceptions are raised when we read the third or fourth file.

Thanks for maintaining this fantastic library freely.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/