PDFsharp & MigraDoc Foundation https://forum.pdfsharp.net/ |
|
PdfDocument memory leaking https://forum.pdfsharp.net/viewtopic.php?f=2&t=2177 |
Page 1 of 1 |
Author: | rkawano [ Wed Oct 24, 2012 8:18 pm ] |
Post subject: | PdfDocument memory leaking |
I am using this method to get total pages of a PDF file: Code: public static Int32 CountPages(String filename) { using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly)) { return inputDocument.PageCount; } } The "InformationOnly" parameter works fine. This is the first free library I have tested that can count pages of large PDF files (>300MB). But when I run my application, the memory increases at first line of the method and don't down after the using, and in few seconds my application throws a OutOfMemoryException (on another part of my app). So I looked for dispose method on PdfDocument and I get it: Code: public void Dispose() (PdfDocument.cs lines 151..168){ Dispose(true); //GC.SuppressFinalize(this); } void Dispose(bool disposing) { if (this.state != DocumentState.Disposed) { if (disposing) { // Dispose managed resources. } //PdfDocument.Gob.DetatchDocument(Handle); } this.state = DocumentState.Disposed; } Its appears that is not disposing anything. So I've debugged the Open() method on PdfReader class and see memory increasing at this loop: Code: // Read all indirect objects (PdfReader.cs lines 346..372)for (int idx = 0; idx < count; idx++) { PdfReference iref = irefs[idx]; if (iref.Value == null) { try { Debug.Assert(document.irefTable.Contains(iref.ObjectID)); PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false); Debug.Assert(pdfObject.Reference == iref); pdfObject.Reference = iref; Debug.Assert(pdfObject.Reference.Value != null, "something got wrong"); } catch (Exception ex) { Debug.WriteLine(ex.Message); } } else { Debug.Assert(document.irefTable.Contains(iref.ObjectID)); iref.GetType(); } // Set maximum object number document.irefTable.maxObjectNumber = Math.Max(document.irefTable.maxObjectNumber, iref.ObjectNumber); } It is not clear to me which object is retaining data in memory. Do anyone knows how to correctly dispose PdfDocument? |
Author: | rkawano [ Thu Nov 08, 2012 1:47 pm ] |
Post subject: | Re: PdfDocument memory leaking |
After some days trying to figure out this problem I found a workaround for our application. First, I need to change the PDFDocument class to set all private members to null on Dispose and recompile the library: Code: void Dispose(bool disposing) { if (this.state != DocumentState.Disposed) { if (disposing) { // Dispose managed resources. this.info = null; this.pages = null; this.fontTable = null; this.catalog = null; this.trailer = null; this.iref = null; this.irefTable = null; } //PdfDocument.Gob.DetatchDocument(Handle); } this.state = DocumentState.Disposed; } And according to Thomas Hoevel comment on this post, I need to call GC after the reading operation: Code: public static Int32 CountPages(String filename) { try { using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly)) { return inputDocument.PageCount; } } finally { GC.Collect(); GC.WaitForPendingFinalizers(); } } It is a workaround and not a definitive fix, I have noted, in some cases, that the memory are not freely after calling the GC collector, but my application can "survive" running without a memory exceptions while reading a sequence of large PDf files (tested a sequence of 10 files with 300MB each). If I don't change the dispose method or don't call the GC, then the memory exceptions are raised when we read the third or fourth file. Thanks for maintaining this fantastic library freely. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |