PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 9:59 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Image compression
PostPosted: Tue Feb 20, 2007 2:18 pm 
Offline

Joined: Tue Feb 20, 2007 1:36 pm
Posts: 8
It would be most welcome if the library could compress images (not reduce resolution, as is sometimes appropriate).
Here are the results of some tests I did:
I started with a 100 page TIF file (A4, resolution 1200 dpi). BTW, such high resolution is absolutely necessary when a scanned document is to be printed on an offset press.

First, I opened the TIF in Acrobat (V 7) and saved as PDF. The file size barely grew, from 30.507.933 Bytes to 30.561.981 Bytes.

Then I used PDFsharp to do the equivalent (TIF aquired through System.Drawing.Image.FromFile, each page passed to PDFsharp through XImage.FromGdiPlusImage and then inserted in the output PDF with XGraphics.DrawImage). The conversion took about four times as long, and the resultant file size was 100.594.087 Bytes, i.e. more than three times as much.

Another consideration is the amount of memory needed during conversion. My understanding is that all newly created PDF pages have to be kept in memory by PDFsharp, until they are finally saved to file. My first test, done with a similar TIF file, but with 1012 pages in it, ran into an OutOfMemoryException. I expect that pages with compressed images on them would need far less memory during processing.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 20, 2007 5:38 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
I cannot explain why the file is so much bigger.

Have you tried a release build? The debug build by default produces "verbose" PDF files that are bigger.

Images in the PDF file use lossless LZ compression (except for JPEG images - those are copied byte by byte into the PDF file).

Not sure if the verbose mode can account for a factor 3 - I don't expect that.

I'd like to know which image format and compression was used for the TIFF file. If it was JPEG or CCITT/FAX than this could be the reason - PDFsharp uses the standard LZ compression, but other methods may be better for your scanned image.
Or maybe the image got converted to 24 bit RGB - this could explain factor 3.

PDFsharp does not read the files - it relies on GDI+ to read them; the 8-bit-to-24-bit-conversion could occur here.

Long story short: we do compress image data. I'd like to know what happens there.

BTW: all pages are kept in memory. With 1000 scanned pages this really could be a problem, but for most applications this approach is appropriate.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 05, 2007 8:48 am 
Offline

Joined: Tue Feb 20, 2007 1:36 pm
Posts: 8
Differences in file size really seem to be caused by differing compression schemes:
A 100 page TIF (CCITT G4): 30.507.933 Bytes,
the same TIF (LZW): 100.349.200 Bytes.

PDFsharp created a file of size 100.521.556 Bytes from the G4, so the result is consistent.

I wish somebody (perhaps a knowledgeable user?) would turn his/her attention to image import and export in the library, including questions of different (= optimal) compression schemes for differing content types! From my experience I can say that GDI+ as an intermediate would have to go, though...

And, it would be nice to have more control over memory allocation, creation of temporary files or whatever is necessary to successfully process really large files.

Peter


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Fri Jul 10, 2009 1:08 pm 
Offline

Joined: Fri Jul 10, 2009 1:05 pm
Posts: 1
Hi

Is there any solution, to store images with CCITT G4 compression?
I have the same effect, almost all imeges converted by this library is more than 200% bigger :(

Grzegorz


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Mon Jul 13, 2009 9:19 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
PDF doesn't support TIFF.
It supports JPEG and LZW.

AFAIR it doesn't support G4 (but I'll check that eventually).

Current implementations of PDFsharp use LZW for any image that is not JPEG.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Sat Jul 18, 2009 4:54 pm 
Offline

Joined: Tue Jul 07, 2009 12:17 am
Posts: 6
You might take a look at this article:
http://www.codeproject.com/KB/GDI-plus/ ... uick&fr=26

It describes how to convert images into bitonal format, which is required for CCITT4 compression, and how to handle multiple page TIFFs.

You would need to go into PdfSharp.Pdf.Advanced.PdfImage, and create a method like "InitializeCcitt4Tiff()", and have some flag in the XImage that specifies it should be bitonal. This flag would be used in the ctor of the PdfImage class.

Good luck, and post back if you have code to contribute!


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Sun Jul 19, 2009 12:59 pm 
Offline

Joined: Tue Feb 20, 2007 1:36 pm
Posts: 8
First of all, it is true that "PDF doesn't support TIFF", but it does support the same encoded (= compressed) image formats that are most widely found in TIFF files. Besides LZW/Flate and DCT (JPEG), which are appropriate for color and grayscale images, also CCITT Fax G3 and G4 is available for monochrome images. PDF, not surprisingly, inherited these capabilities from PostScript.

When you deal with images in PDF then you deal with so called "Filtered Streams". They consist of the encoded image data and, in addition, of information in the stream dictionary about the appropriate filter(s) needed to decode the data.
The above is knowledge I took from the specs, but, as I am an empirically minded person, I wanted to verify this for myself. So i created an image of a small black square (15 x 15) in the middle of an empty page (35 x 35 pix) and saved that to a TIFF G4 file. Then I saved the same image to a PDF file. When I looked at the results in a binary editior, I could see that the identical encoded image data can be found in both files, namely "ff c9 c3 1f ff ff ff ff fc 7f f0 01 00 10" (hex representation). In the PDF it looks like this:
<<
/Type /XObject
/Subtype /Image
/Name /Im0
/Filter [ /CCITTFaxDecode ]
/DecodeParms [ << /K -1 /Columns 35 /Rows 35 >> ]
/Width 35
/Height 35
/ColorSpace /DeviceGray
/BitsPerComponent 1
/Length 7 0 R
>>
stream
-- here the binary data --
endstream

(Remark: When I imported the TIFF file in Acrobat and saved to PDF, the image was re-encoded to Flate, with an increase of size)

This leads me to the question whether it shouldn't be possible to directly import G4 encoded pages from a TIFF file into a PDF document without re-encoding the images.
Instead of GDI+, one would probably have to use libtiff (or GraphicsMagick) to access the image(s) and metadata.


Attachments:
File comment: file examples
square.zip [2 KiB]
Downloaded 1055 times
Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Mon Jul 20, 2009 9:09 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
I stand corrected.

I put "/CCITTFaxDecode" on our TODO list, but we won't address it before the release scheduled for this summer is out.

"/CCITTFaxDecode" is a lossless compression, so using GDI+ and re-compressing the image costs nothing but CPU time (but maybe there's a GDI+ flag that indicates FAX compression (I'll check that)).

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Wed Sep 30, 2009 11:12 am 
Offline

Joined: Tue Sep 08, 2009 12:09 pm
Posts: 3
Thomas Hoevel wrote:
I stand corrected.

I put "/CCITTFaxDecode" on our TODO list, but we won't address it before the release scheduled for this summer is out.

"/CCITTFaxDecode" is a lossless compression, so using GDI+ and re-compressing the image costs nothing but CPU time (but maybe there's a GDI+ flag that indicates FAX compression (I'll check that)).


Any progress on this?


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Wed Sep 30, 2009 12:55 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
kostadinnm wrote:
Any progress on this?

Not yet - I have to work for projects we get paid for ...

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Tue Aug 23, 2011 3:35 pm 
Offline

Joined: Fri Aug 12, 2011 5:51 pm
Posts: 5
Any luck with this? I do have a PDF file that has 2 images, both images were compressed using CCITTFaxDecode, but I cannot extract it using PDFSharp.


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Tue Aug 23, 2011 3:42 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
Extracting images was left as an exercise to the reader (see Export Images sample).

Back to topic: CCITT compression is implemented in the publicly available version of PDFsharp.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Mon Aug 29, 2011 4:19 pm 
Offline

Joined: Fri Aug 12, 2011 5:51 pm
Posts: 5
What do you mean with publicly available version of PDFsharp? Can you point me to that version?

Thanks


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Tue Aug 30, 2011 8:33 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
michael.hidalgo wrote:
Can you point me to that version?

http://pdfsharp.codeplex.com/releases/view/37054

Please note that PDFsharp only supports encoding of CCITT images (the topic of this thread), but not decoding (your question).

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject: Re: Image compression
PostPosted: Tue Aug 30, 2011 2:22 pm 
Offline

Joined: Fri Aug 12, 2011 5:51 pm
Posts: 5
Thanks for the information


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 38 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group