PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sat Jul 13, 2024 12:06 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 16 posts ] 
Author Message
 Post subject: Memory limitations?
PostPosted: Sat Dec 01, 2007 7:54 pm 
Offline

Joined: Sat Dec 01, 2007 7:22 pm
Posts: 15
I'm using PdfSharp to generate quite a large PDF (over 1500 pages.) This is being done dynamically based on a large dataset from a local file. Each page has some text with an image.

What I'm finding is that during the PDF generation, my program would crash after a certain number of pages generated (it's always after 1000) with different exceptions (so far I've counted 3 of them, 2 relating to memory.)
The problem is, this happens on a DIFFERENT image each time. I've checked the images and they all seem to be supported formats, without any corrupted images.
What I'm guessing is that PdfSharp has a memory limitation of some sort, although that doesn't explain the Generic GDI+ exception.

I'm using the precompiled DLL I got from these forums (posted by Thomas Hövel) which seems to be an older version (0.9.653.0) so I'm not sure if that will have any bearing on this problem.

Someone please help, as this is driving me crazy!

Here are the exceptions...

Exception 1:
Quote:
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Pdf.PdfPage.GetImageName (XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage (XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp (MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc (Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc (Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)



Exception 2:
Quote:
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at PdfSharp.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Deflate()
at PdfSharp.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Write(Byte[] buf, Int32 off, Int32 len)
at PdfSharp.Pdf.Filters.FlateDecode.Encode(Byte[] data)
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


Exception 3:
Quote:
System.Runtime.InteropServices.ExternalException: A generic error occurred in GDI+.
at System.Drawing.Image.Save(Stream stream, ImageCodecInfo encoder, EncoderParameters encoderParams)
at System.Drawing.Image.Save(Stream stream, ImageFormat format)
at PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
at PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
at PdfSharp.Pdf.PdfPage.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)
at moviedb.ExportDatabaseDialog.DrawImage(String image, XGraphics gfx, Double y, Double maxWidth, Double maxHeight)
at moviedb.ExportDatabaseDialog.BtnStart_Click(Object sender, EventArgs e)
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 02, 2007 4:13 pm 
Offline

Joined: Sat Dec 01, 2007 7:22 pm
Posts: 15
Ok, well it's clear why the program keeps crashing-- after just over 700 entries, the process is taking up one GIGABYTE of memory.

This seems like an awfully large memory footprint, so I'm assuming it's because PdfSharp works with everything in memory uncompressed and then compresses to PDF when it is saved.

Ok, so fine: my solution is to then dump every 100 or so entries to a separate PDF and then merge them all together. So I tried a little experiment: I wrote 300 entries to the PDF and stopped, but even after invoking the .Close() method on my PdfDocument object, the memory footprint of my program was still HUGE! I tried using the .Dispose() method but it had no effect.

Is this a bug in the library? How can I release the completed PDF from memory?

I'm using the latest version of PdfSharp, 1.0.898.0.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 03, 2007 10:28 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
ACS wrote:
Ok, so fine: my solution is to then dump every 100 or so entries to a separate PDF and then merge them all together. So I tried a little experiment: I wrote 300 entries to the PDF and stopped, but even after invoking the .Close() method on my PdfDocument object, the memory footprint of my program was still HUGE!

I don't know if there's a memory leak in PDFsharp.

But for a real test you should display a message box after writing a file (with 100 or 300) pages) and note the process size, but create several files with one process.

Theoretically the memory should be free after closing the file - but the process will not shrink immediately.
However free memory will be re-used, so process size shouldn't change much while creating files #2, #3, ...

It's a feature that PDFsharp keeps the complete PDF file in memory - only used items will finally be written to the PDF file.
This works fine - unless there are many pages and many images.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 04, 2007 2:33 am 
Offline

Joined: Sat Dec 01, 2007 7:22 pm
Posts: 15
Yes there are many pages and images (1500+ pages, and almost as many images.)

With a bit of tinkering I managed to optimize the code a bit better by forcing the garbage collector. One thing I found was that forcing the garbage collector in the same method had no effect, but as soon as I moved the code to a separate method and forced the garbage collector after that method finished, the result was a dramatic decrease in memory usage. However this of course required that I close the PdfDocument object, which requires me to reopen it each time.

With this discovery I'm going to try a few more things to see if I can improve the code even further.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 11:06 am 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Hi people I´m a Spanish developer (sorry for my english).

Based on the GDI Sample ConcatenateDocuments Variant1, I have developed a method that instead save the file in disk:

outputDocument.Save(filename);

it returns a byte[]:

// Save the document...
using (MemoryStream strm2 = new MemoryStream())
{
outputDocument.Save(strm2);
byte[] byteArr = strm2.ToArray();
return byteArr;
}

When the PDF is big (84 MB, for example) the program crashed and said the same Exception:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)


I have read your post but I cannot find how can I improve my code.

Any clue?


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 3:37 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
You may need to change the technique you're using if you're generating huge documents with lots of images. You may need to produce smaller documents with migradoc and then try to assemble them into the final doc using PDFSharp, rather than trying to render the entire huge document with MigraDoc.

Don't know if this is an option, but with the 2GB limit in .Net (32bit), the garbage collector will not be able to free the images until the entire thing is rendered.

To follow up on the last poster, I also have changed MigraDoc to allow disk based images to be specified and loaded at rendering time. I did this to control the memory footprint when generating docs with large deeply nested tables containing images.

-Jeff


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 4:05 pm 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Thank you very much jeff for answer.

But I dont understand you very well.

Why Migradoc?
I´m just using PDFSharp.dll

Why images?
I have 500 small pdfs created originally by crystalreports and saved on database.
Each pdf page has a small jpg inside but this isnt important, is it?

I just have to join them into another new pdf (byte[]). And finally send it again to database.

The signature of my function is:

/// <summary>
/// Imports all pages from a list of documents. Basaded on ConcatenateDocuments.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files);


Thanks again.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 5:12 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
* Ok, So only using PDFsharp, not MigraDoc.
* You are assembling a large PDF from many small PDF documents and running out of memory.
* No, the JPG image isn't important in this case.

Are you certain that you are disposing of each Small PDF document after you append it to the main large document?

If you're using variant 1, see if wrapping your inputDocument in a Using statement to force disposal of each inputDocument after it is appended.

Code:
   
// Iterate files
foreach (string file in files)
{
    // Open the document to import pages from it.
    using (PdfDocument inputDocument = PdfReader.Open(file, PdfDocumentOpenMode.Import))
    {       
          // append the pages to the main document.
    }
}


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 5:42 pm 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Yes, I do it (or I think so).

This is the whole code of my wrapper.


Code:
/// <summary>
/// Imports all pages from a list of documents. Basado en el ejemplo Concatenar.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files)
{
    // Open the output document
    PdfDocument outputDocument = new PdfDocument();
    byte[] contenido = null;

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for init");

    // Iterate files
    //foreach (byte[] contenido in files)//replaced trying to improve memory usage
    for (int i = 0; i < files.Count; i++)
    {
        contenido = files[i];
        using (MemoryStream strm = new MemoryStream(contenido))
        {
            // Open the document to import pages from it.
            PdfDocument inputDocument = PdfReader.Open(strm, PdfDocumentOpenMode.Import);

            // Iterate pages
            int count = inputDocument.PageCount;
            for (int idx = 0; idx < count; idx++)
            {
                // Get the page from the external document...
                PdfPage page = inputDocument.Pages[idx];
                // ...and add it to the output document.
                outputDocument.AddPage(page);
            }
            strm.Close();
        }
        files[i] = null;//to improve memory usage
    }

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for done");

    //to improve memory usage
    files.Clear();

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - clear done");

    try
    {
        // Save the document...
        using (MemoryStream strm2 = new MemoryStream())
        {
            outputDocument.Save(strm2);
            byte[] byteArr = strm2.ToArray();
            return byteArr;
        }
    }
    catch (Exception exc)
    {
        System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - Error: " + exc.Message);
        System.Diagnostics.Trace.WriteLine (exc.StackTrace);
        throw exc;
    }
}


Thanks again.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 5:44 pm 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
And this is the log:

ConcatenarPDFs - for init
ConcatenarPDFs - for done
ConcatenarPDFs - clear done
ConcatenarPDFs - Error: Se produjo una excepción de tipo 'System.OutOfMemoryException'.
en System.IO.MemoryStream.set_Capacity(Int32 value)
en System.IO.MemoryStream.EnsureCapacity(Int32 value)
en System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
en PdfSharp.Pdf.IO.PdfWriter.Write(Byte[] bytes)
en PdfSharp.Pdf.IO.PdfWriter.WriteStream(PdfDictionary value, Boolean omitStream)
en PdfSharp.Pdf.PdfDictionary.WriteDictionaryStream(PdfWriter writer)
en PdfSharp.Pdf.PdfDictionary.WriteObject(PdfWriter writer)
en PdfSharp.Pdf.PdfDocument.DoSave(PdfWriter writer)
en PdfSharp.Pdf.PdfDocument.Save(Stream stream, Boolean closeStream)


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Tue Nov 23, 2010 10:24 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
Try this: Note that the PdfDocument declaration is in the USING statement. No guarantees here, but this may cause the small PdfDocuments to be disposed earlier.

Code:
/// <summary>
/// Imports all pages from a list of documents. Basado en el ejemplo Concatenar.Variant1
/// </summary>
public static byte[] ConcatenarPDFs(System.Collections.Generic.List<byte[]> files)
{
    // Open the output document
    PdfDocument outputDocument = new PdfDocument();
    byte[] contenido = null;

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for init");

    // Iterate files
    //foreach (byte[] contenido in files)//replaced trying to improve memory usage
    for (int i = 0; i < files.Count; i++)
    {
        contenido = files[i];
        using (MemoryStream strm = new MemoryStream(contenido))
        {
            // Open the document to import pages from it.
            using (PdfDocument inputDocument = PdfReader.Open(strm, PdfDocumentOpenMode.Import))
            {
                // Iterate pages
                int count = inputDocument.PageCount;
                for (int idx = 0; idx < count; idx++)
                {
                    // Get the page from the external document...
                    PdfPage page = inputDocument.Pages[idx];
                    // ...and add it to the output document.
                    outputDocument.AddPage(page);
                }
                strm.Close();
            }
        }
        files[i] = null;//to improve memory usage
    }

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - for done");

    //to improve memory usage
    files.Clear();

    System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - clear done");

    try
    {
        // Save the document...
        using (MemoryStream strm2 = new MemoryStream())
        {
            outputDocument.Save(strm2);
            byte[] byteArr = strm2.ToArray();
            return byteArr;
        }
    }
    catch (Exception exc)
    {
        System.Diagnostics.Trace.WriteLine ("ConcatenarPDFs - Error: " + exc.Message);
        System.Diagnostics.Trace.WriteLine (exc.StackTrace);
        throw exc;
    }
}


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Wed Nov 24, 2010 8:35 am 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Hi Jeff,

using your code ocurrs the same Exception.

After this trial, I have done another one: adding to your code this line
Code:
outputDocument.PrepareForSave();
before the saving, but crashes too.

Is there any bug on the Save() method?
The total size of the pdf shouldn't be greater than 200MB, however the RAM increases a lot of more, till error.

Thanks.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Wed Nov 24, 2010 2:25 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
So, it looks like you're loading all the little PDFs into a List of Byte[], then looping through, adding each to the Main PDF file, followed by returning yet another byte[] for the entire document. That's certainly going to eat tons of memory as you end up with probably 3 copies every little PDF in memory.

At a minimum, you should free each byte[] when you're finished with it.

It would be much better to load each little PDF, append it the main pdf, then free that byte[] and go onto the next little PDF file. This way, you only have the Main document in memory, plus the current Little PDF byte[]. So, instead of passing a List of byte[] to 'ConcatenarPDFs', maybe just pass a list of the filenames and process each this way.

This would probably cut down on mem usage.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Wed Nov 24, 2010 4:19 pm 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Hi,
I did yesterday the minimun you are telling me (with this line of my above code, inside the for):
Code:
files[i] = null;//to improve memory usage

Didn´t I?

However I agree with you in that I have the data twice in memory.
How can I use the Append?
outputDocument.Append () doesn´t exist 'as is' !!!

Is there any suitable Sample in the code I had downloaded code?


Thanks again.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Wed Nov 24, 2010 4:32 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
I believe at this point, you may need to do a little memory use profiling to see where the memory is being used.

You said about 1500 pages.

Perhaps you should try an experiment by taking a Single little PDF and adding it 1500 times to the master PDF Document and see if you still run out of memory.
If so, then this may just be too large to create?

Without the application, it's hard to know what else is going on.

I didn't mean to imply that there was an Append *method*. What you're doing is appending/adding pages to the end of the master document.


Top
 Profile  
Reply with quote  
 Post subject: Re: Memory limitations?
PostPosted: Thu Nov 25, 2010 10:59 am 
Offline

Joined: Tue Nov 23, 2010 9:39 am
Posts: 8
Ok, thanks for your help,
by the moment (in our hurries) we have to rollback our code to a version with PDFSam, that is a Java utility we had been using in previous years. I don´t like very much to use it, because it isn't dotnet code, but I´m not the boss and it works...

When I have a bit more of time (who knows when...) I want to recode my function as you suggest me.

Thanks again, Jeff.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 160 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group