PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 12:00 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Mon Mar 26, 2012 3:22 pm 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
I'm constantly hitting issues caused by "out of memory" exceptions. The exceptions come from the depths of PdfSharp lib and usually look like this:
Code:
   in PdfSharp.Pdf.Advanced.PdfImage.ReadTrueColorMemoryBitmap(Int32 components, Int32 bits, Boolean hasAlpha)
   in PdfSharp.Pdf.Advanced.PdfImage.InitializeNonJpeg()
   in PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image)
   in PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image)
   in PdfSharp.Drawing.XForm.GetImageName(XImage image)
   in PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.GetImageName(XImage image)
   in PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image)
   in PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height)
   in PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height)


This happens when I create a PDF file with number of large PNGs. Exceptions get thrown usually when I reach 25-30th PNG file. PNG files are 32bit, 4000x4000 pixels each.
I'm pretty sure that this is not a problem with amount of available memory. The process that gets those exceptions eats no more than 800 MB (private working set) on a 64bit machine with 4GB of ram. My PDF building code sometimes eats even more ram (up to 2GB) but these "out of memory" exceptions get thrown only when I'm using these large PNGs. I can put to PDF hundreds of JPEGs and everything works fine. But when I use large PNGs, I get exceptions almost instantly.

The other thing that intrigues me is very low speed of processing PNGs. Why adding 32-bit ARGB files takes so much time?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 10:10 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
SiliconMind wrote:
Why adding 32-bit ARGB files takes so much time?
PDFsharp adds masks for images with transparency (ARGB includes the alpha channel that specifies the opacity of the pixels).

If you don't need transparency, then use RGB bitmaps for faster processing.

ReadTrueColorMemoryBitmap reads the image data from the operating system and creates the byte arrays that will be included in the PDF.
So this is the place where out of memory exceptions occur when you only add images to the PDF.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 10:55 am 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
Thomas Hoevel wrote:
PDFsharp adds masks for images with transparency (ARGB includes the alpha channel that specifies the opacity of the pixels).

If you don't need transparency, then use RGB bitmaps for faster processing.

I need transparency, so I can't use JPEGs. I've noticed that you're using MemoryBitmap to copy image's byte data into array and then you process that array. This is quite noticeable overhead - at one point your code needs three times the amount of memory required for the original bitmap (1 for System.Drawing.Bitmap, 2 for Memory stream to which you save the bitmap, 3 for byte array you use for processing). For large images this is really a lot of memory (3x 64MB per one image in my case). Why not just use Bitmap.LockBits?

Thomas Hoevel wrote:
ReadTrueColorMemoryBitmap reads the image data from the operating system and creates the byte arrays that will be included in the PDF.
So this is the place where out of memory exceptions occur when you only add images to the PDF.

Ok, but why the exception? There is clearly enough system memory available.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 11:47 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
SiliconMind wrote:
Why not just use Bitmap.LockBits?
Interesting idea, but currently we use BitmapImage internally and LockBits is not available. Must check how many follow-up changes this will require.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 12:19 pm 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
Thomas Hoevel wrote:
SiliconMind wrote:
Why not just use Bitmap.LockBits?
Interesting idea, but currently we use BitmapImage internally and LockBits is not available. Must check how many follow-up changes this will require.

I think that using Bitmap.LockBits inside ReadTrueColorMemoryBitmap won't affect other code. Using LockBits would significantly reduce memory usage and speed up processing. There would be no need to create two additional copies of the image.

Anyway - I did some debugging. The exception is thrown when ReadTrueColorMemoryBitmap tries to allocate imageBits byte array on line 362. But I have no idea why.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 1:31 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
SiliconMind wrote:
I think that using Bitmap.LockBits inside ReadTrueColorMemoryBitmap won't affect other code.
I hope I didn't miss something: We cannot currently call Bitmap.LockBits inside ReadTrueColorMemoryBitmap because we don't have a Bitmap there. What we have is a BitmapSource, not a Bitmap.

To get a Bitmap instead of a BitmapSource, we'd have to change code outside this file. I don't know which consequences this may have.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 1:45 pm 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
Thomas Hoevel wrote:
I hope I didn't miss something: We cannot currently call Bitmap.LockBits inside ReadTrueColorMemoryBitmap because we don't have a Bitmap there. What we have is a BitmapSource, not a Bitmap.

To get a Bitmap instead of a BitmapSource, we'd have to change code outside this file. I don't know which consequences this may have.


ReadTrueColorMemoryBitmap uses XImage.gdiImage field to get byte data for processing. The XImage.gdiImage is just a System.Drawing.Image so it I think that it is possible to use LockBits. Or it's me who's missing something... which is possible as I do not know the big picture of a whole PDFSarp lib source.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 2:58 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
The GDI+ build uses System.Drawing.Image which is the base class of System.Drawing.Bitmap.
LockBits is implemented by Bitmap, not Image.
The WPF build uses System.Windows.Media.Imaging.BitmapSource.

I falsely assumed you were using the WPF build. It might be worth trying the WPF build just in case this problem is used by a limitation of GDI+ resources and not a LOH fragmentation problem.
I'd try the WPF build.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 27, 2012 3:46 pm 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
Thomas Hoevel wrote:
The GDI+ build uses System.Drawing.Image which is the base class of System.Drawing.Bitmap.
LockBits is implemented by Bitmap, not Image.
The WPF build uses System.Windows.Media.Imaging.BitmapSource.

I falsely assumed you were using the WPF build. It might be worth trying the WPF build just in case this problem is used by a limitation of GDI+ resources and not a LOH fragmentation problem.
I'd try the WPF build.

Doh, you're right. I forgot about WPF implementation. But still even with current WPF version, the issue with additional byte array and MemoryStream remains.

First, although very small, workaround for this issue is not to copy MemoryStream into byte array (lines 362-365) but to use MemoryStream.GetBuffer(). That way we have one less byte array to allocate. I've tested this and the OutOfMemoryException was not thrown until 40th PNG file or so (earlier it was about 25 PNGs). So we've got an Improvement.

LockBits route is still possible. For WPF variant you could create a temporary WritableBitmap using constructor WritableBitmap(BitmapSource). Then use WritableBitmap.BackBuffer property which returns pointer to the bitmap contents - exactly like BitmapData.Scan0 that is returned by the call to System.Drawing.Bitmap.LockBits() method.

Another advantage of this approach is that you don't need that ugly ReadWord stuff and all these hardcoded values for testing bitmap compatibility.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 28, 2012 1:29 pm 
Offline

Joined: Thu Feb 17, 2011 12:27 pm
Posts: 11
If you need a copy of MemoryStream data, you can just use MemoryStream.ToArray(); Instead of this:
Code:
imageBits = new byte[streamLength];
memory.Seek(0, SeekOrigin.Begin);
memory.Read(imageBits, 0, streamLength);


But remember that MemoryStream.ToArray() creates a copy of a byte array encapsulated by MemoryStream. To reduce memory usage and improve performance you should use MemoryStream.GetBuffer();
However note that MemoryStream.GetBuffer() method returns the entire allocated buffer (even the unused buffer) with padding for the unused buffer space. So you should never use imageBits.Length if you want to know the actual data length inside the array. Instead use helper variable:
Code:
int streamLength = memory.Length;
byte[] imageBits = memory.GetBuffer();
memory.Close();


As a matter of fact MemoryStream.Close() or MemoryStream.Dispose() does nothing to release memory used by the MemoryStream. So thanks to MemoryStream.GetBuffer() you can reuse the byte array and omit creating new one.

I've attached a modified PdfImage.cs file with appropriate changes. I didn't have time to try WritableBitmap.BackBuffer / Bitmap.LockBits solution but I still believe that it is the right way to go. You should consider that.

I've also noticed that inside method FlateDecode.Encode() you do something like this:
Code:
MemoryStream.Capacity = MemoryStream.Length;
return MemoryStream.GetBuffer();

By changing memory stream's capacity you create another copy of a byte array used by that memory stream. It's another easy way to hit the LOH issue. You could return whole buffer and then write to pdf only actual data, without padding bytes. You would have to store real data length or use reverse loop before writing bytes to a file, to get actual data length inside buffer array.


Attachments:
File comment: Large Object Heap fragmentation fix
PdfImage.zip [10.29 KiB]
Downloaded 716 times
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 44 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group