PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Memory limitations?
https://forum.pdfsharp.net/viewtopic.php?f=2&t=269
Page 1 of 1

Author:  ACS [ Sat Dec 01, 2007 7:54 pm ]
Post subject:  Memory limitations?

I'm using PdfSharp to generate quite a large PDF (over 1500 pages.) This is being done dynamically based on a large dataset from a local file. Each page has some text with an image.

What I'm finding is that during the PDF generation, my program would crash after a certain number of pages generated (it's always after 1000) with different exceptions (so far I've counted 3 of them, 2 relating to memory.)
The problem is, this happens on a DIFFERENT image each time. I've checked the images and they all seem to be supported formats, without any corrupted images.
What I'm guessing is that PdfSharp has a memory limitation of some sort, although that doesn't explain the Generic GDI+ exception.

I'm using the precompiled DLL I got from these forums (posted by Thomas Hövel) which seems to be an older version (0.9.653.0) so I'm not sure if that will have any bearing on this problem.

Someone please help, as this is driving me crazy!

Here are the exceptions...

Exception 1:
Quote:
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Pdf.PdfPage.GetImageName (XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage (XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp (MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc (Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc (Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)



Exception 2:
Quote:
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at PdfSharp.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Deflate()
at PdfSharp.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Write(Byte[] buf, Int32 off, Int32 len)
at PdfSharp.Pdf.Filters.FlateDecode.Encode(Byte[] data)
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


Exception 3:
Quote:
System.Runtime.InteropServices.ExternalException: A generic error occurred in GDI+.
at System.Drawing.Image.Save(Stream stream, ImageCodecInfo encoder, EncoderParameters encoderParams)
at System.Drawing.Image.Save(Stream stream, ImageFormat format)
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Pdf.PdfPage.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

Author:  ACS [ Sun Dec 02, 2007 4:13 pm ]
Post subject: 

Ok, well it's clear why the program keeps crashing-- after just over 700 entries, the process is taking up one GIGABYTE of memory.

This seems like an awfully large memory footprint, so I'm assuming it's because PdfSharp works with everything in memory uncompressed and then compresses to PDF when it is saved.

Ok, so fine: my solution is to then dump every 100 or so entries to a separate PDF and then merge them all together. So I tried a little experiment: I wrote 300 entries to the PDF and stopped, but even after invoking the .Close() method on my PdfDocument object, the memory footprint of my program was still HUGE! I tried using the .Dispose() method but it had no effect.

Is this a bug in the library? How can I release the completed PDF from memory?

I'm using the latest version of PdfSharp, 1.0.898.0.

Author:  Thomas Hoevel [ Mon Dec 03, 2007 10:28 am ]
Post subject: 

ACS wrote:
Ok, so fine: my solution is to then dump every 100 or so entries to a separate PDF and then merge them all together. So I tried a little experiment: I wrote 300 entries to the PDF and stopped, but even after invoking the .Close() method on my PdfDocument object, the memory footprint of my program was still HUGE!

I don't know if there's a memory leak in PDFsharp.

But for a real test you should display a message box after writing a file (with 100 or 300) pages) and note the process size, but create several files with one process.

Theoretically the memory should be free after closing the file - but the process will not shrink immediately.
However free memory will be re-used, so process size shouldn't change much while creating files #2, #3, ...

It's a feature that PDFsharp keeps the complete PDF file in memory - only used items will finally be written to the PDF file.
This works fine - unless there are many pages and many images.

Author:  ACS [ Tue Dec 04, 2007 2:33 am ]
Post subject: 

Yes there are many pages and images (1500+ pages, and almost as many images.)

With a bit of tinkering I managed to optimize the code a bit better by forcing the garbage collector. One thing I found was that forcing the garbage collector in the same method had no effect, but as soon as I moved the code to a separate method and forced the garbage collector after that method finished, the result was a dramatic decrease in memory usage. However this of course required that I close the PdfDocument object, which requires me to reopen it each time.

With this discovery I'm going to try a few more things to see if I can improve the code even further.

Author:  karbunko [ Tue Nov 23, 2010 11:06 am ]
Post subject:  Re: Memory limitations?

Hi people I´m a Spanish developer (sorry for my english).

Based on the GDI Sample ConcatenateDocuments Variant1, I have developed a method that instead save the file in disk:

outputDocument.Save(filename);

it returns a byte[]:

// Save the document...
using (MemoryStream strm2 = new MemoryStream())
{
outputDocument.Save(strm2);
byte[] byteArr = strm2.ToArray();
return byteArr;
}

When the PDF is big (84 MB, for example) the program crashed and said the same Exception:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)


I have read your post but I cannot find how can I improve my code.

Any clue?

Author:  jeffhare [ Tue Nov 23, 2010 3:37 pm ]
Post subject:  Re: Memory limitations?

You may need to change the technique you're using if you're generating huge documents with lots of images. You may need to produce smaller documents with migradoc and then try to assemble them into the final doc using PDFSharp, rather than trying to render the entire huge document with MigraDoc.

Don't know if this is an option, but with the 2GB limit in .Net (32bit), the garbage collector will not be able to free the images until the entire thing is rendered.

To follow up on the last poster, I also have changed MigraDoc to allow disk based images to be specified and loaded at rendering time. I did this to control the memory footprint when generating docs with large deeply nested tables containing images.

-Jeff

Author:  karbunko [ Tue Nov 23, 2010 4:05 pm ]
Post subject:  Re: Memory limitations?

Thank you very much jeff for answer.

But I dont understand you very well.

Why Migradoc?
I´m just using PDFSharp.dll

Why images?
I have 500 small pdfs created originally by crystalreports and saved on database.
Each pdf page has a small jpg inside but this isnt important, is it?

I just have to join them into another new pdf (byte[]). And finally send it again to database.

The signature of my function is:

/// <summary>
/// Imports all pages from a list of documents. Basaded on ConcatenateDocuments.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files);


Thanks again.

Author:  jeffhare [ Tue Nov 23, 2010 5:12 pm ]
Post subject:  Re: Memory limitations?

* Ok, So only using PDFsharp, not MigraDoc.
* You are assembling a large PDF from many small PDF documents and running out of memory.
* No, the JPG image isn't important in this case.

Are you certain that you are disposing of each Small PDF document after you append it to the main large document?

If you're using variant 1, see if wrapping your inputDocument in a Using statement to force disposal of each inputDocument after it is appended.

Code:
   
// Iterate files
foreach (string file in files)
{
    // Open the document to import pages from it.
    using (PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import))
    {       
          // append the pages to the main document.
    }
}

Author:  karbunko [ Tue Nov 23, 2010 5:42 pm ]
Post subject:  Re: Memory limitations?

Yes, I do it (or I think so).

This is the whole code of my wrapper.


Code:
/// <summary>
/// Imports all pages from a list of documents. Basado en el ejemplo Concatenar.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files)
{
    // Open the output document
    PdfDocument outputDocument = new PdfDocument();
    byte[] contenido = null;

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for init");

    // Iterate files
    //foreach (byte[] contenido in files)//replaced trying to improve memory usage
    for (int i = 0; i < files.Count; i++)
    {
        contenido = files[i];
        using (MemoryStream strm = new MemoryStream(contenido))
        {
            // Open the document to import pages from it.
            PdfDocument inputDocument = PdfReader.Open(strm, PdfDocumentOpenMode.Import);

            // Iterate pages
            int count = inputDocument.PageCount;
            for (int idx = 0; idx < count; idx++)
            {
                // Get the page from the external document...
                PdfPage page = inputDocument.Pages[idx];
                // ...and add it to the output document.
                outputDocument.AddPage(page);
            }
            strm.Close();
        }
        files[i] = null;//to improve memory usage
    }

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for done");

    //to improve memory usage
    files.Clear();

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - clear done");

    try
    {
        // Save the document...
        using (MemoryStream strm2 = new MemoryStream())
        {
            outputDocument.Save(strm2);
            byte[] byteArr = strm2.ToArray();
            return byteArr;
        }
    }
    catch (Exception exc)
    {
        System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - Error: " + exc.Message);
        System.Diagnostics.Trace.WriteLine (exc.StackTrace);
        throw exc;
    }
}


Thanks again.

Author:  karbunko [ Tue Nov 23, 2010 5:44 pm ]
Post subject:  Re: Memory limitations?

And this is the log:

ConcatenarPDFs - for init
ConcatenarPDFs - for done
ConcatenarPDFs - clear done
ConcatenarPDFs - Error: Se produjo una excepción de tipo 'System.OutOfMemoryException'.
en System.IO.MemoryStream.set_Capacity(Int32 value)
en System.IO.MemoryStream.EnsureCapacity(Int32 value)
en System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
en PdfSharp.Pdf.IO.PdfWriter.Write(Byte[] bytes)
en PdfSharp.Pdf.IO.PdfWriter.WriteStream(PdfDictionary value, Boolean omitStream)
en PdfSharp.Pdf.PdfDictionary.WriteDictionaryStream(PdfWriter writer)
en PdfSharp.Pdf.PdfDictionary.WriteObject(PdfWriter writer)
en PdfSharp.Pdf.PdfDocument.DoSave(PdfWriter writer)
en PdfSharp.Pdf.PdfDocument.Save(Stream stream, Boolean closeStream)

Author:  jeffhare [ Tue Nov 23, 2010 10:24 pm ]
Post subject:  Re: Memory limitations?

Try this: Note that the PdfDocument declaration is in the USING statement. No guarantees here, but this may cause the small PdfDocuments to be disposed earlier.

Code:
/// <summary>
/// Imports all pages from a list of documents. Basado en el ejemplo Concatenar.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files)
{
    // Open the output document
    PdfDocument outputDocument = new PdfDocument();
    byte[] contenido = null;

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for init");

    // Iterate files
    //foreach (byte[] contenido in files)//replaced trying to improve memory usage
    for (int i = 0; i < files.Count; i++)
    {
        contenido = files[i];
        using (MemoryStream strm = new MemoryStream(contenido))
        {
            // Open the document to import pages from it.
            using (PdfDocument inputDocument = PdfReader.Open(strm, PdfDocumentOpenMode.Import))
            {
                // Iterate pages
                int count = inputDocument.PageCount;
                for (int idx = 0; idx < count; idx++)
                {
                    // Get the page from the external document...
                    PdfPage page = inputDocument.Pages[idx];
                    // ...and add it to the output document.
                    outputDocument.AddPage(page);
                }
                strm.Close();
            }
        }
        files[i] = null;//to improve memory usage
    }

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for done");

    //to improve memory usage
    files.Clear();

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - clear done");

    try
    {
        // Save the document...
        using (MemoryStream strm2 = new MemoryStream())
        {
            outputDocument.Save(strm2);
            byte[] byteArr = strm2.ToArray();
            return byteArr;
        }
    }
    catch (Exception exc)
    {
        System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - Error: " + exc.Message);
        System.Diagnostics.Trace.WriteLine (exc.StackTrace);
        throw exc;
    }
}

Author:  karbunko [ Wed Nov 24, 2010 8:35 am ]
Post subject:  Re: Memory limitations?

Hi Jeff,

using your code ocurrs the same Exception.

After this trial, I have done another one: adding to your code this line
Code:
outputDocument.PrepareForSave();
before the saving, but crashes too.

Is there any bug on the Save() method?
The total size of the pdf shouldn't be greater than 200MB, however the RAM increases a lot of more, till error.

Thanks.

Author:  jeffhare [ Wed Nov 24, 2010 2:25 pm ]
Post subject:  Re: Memory limitations?

So, it looks like you're loading all the little PDFs into a List of Byte[], then looping through, adding each to the Main PDF file, followed by returning yet another byte[] for the entire document. That's certainly going to eat tons of memory as you end up with probably 3 copies every little PDF in memory.

At a minimum, you should free each byte[] when you're finished with it.

It would be much better to load each little PDF, append it the main pdf, then free that byte[] and go onto the next little PDF file. This way, you only have the Main document in memory, plus the current Little PDF byte[]. So, instead of passing a List of byte[] to 'ConcatenarPDFs', maybe just pass a list of the filenames and process each this way.

This would probably cut down on mem usage.

Author:  karbunko [ Wed Nov 24, 2010 4:19 pm ]
Post subject:  Re: Memory limitations?

Hi,
I did yesterday the minimun you are telling me (with this line of my above code, inside the for):
Code:
files[i] = null;//to improve memory usage

Didn´t I?

However I agree with you in that I have the data twice in memory.
How can I use the Append?
outputDocument.Append () doesn´t exist 'as is' !!!

Is there any suitable Sample in the code I had downloaded code?


Thanks again.

Author:  jeffhare [ Wed Nov 24, 2010 4:32 pm ]
Post subject:  Re: Memory limitations?

I believe at this point, you may need to do a little memory use profiling to see where the memory is being used.

You said about 1500 pages.

Perhaps you should try an experiment by taking a Single little PDF and adding it 1500 times to the master PDF Document and see if you still run out of memory.
If so, then this may just be too large to create?

Without the application, it's hard to know what else is going on.

I didn't mean to imply that there was an Append *method*. What you're doing is appending/adding pages to the end of the master document.

Author:  karbunko [ Thu Nov 25, 2010 10:59 am ]
Post subject:  Re: Memory limitations?

Ok, thanks for your help,
by the moment (in our hurries) we have to rollback our code to a version with PDFSam, that is a Java utility we had been using in previous years. I don´t like very much to use it, because it isn't dotnet code, but I´m not the boss and it works...

When I have a bit more of time (who knows when...) I want to recode my function as you suggest me.

Thanks again, Jeff.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/