PDFsharp & MigraDoc Foundation

Question regarding merging of PDF's
Page 1 of 1

Author:  Efflixi [ Tue Oct 26, 2010 5:59 pm ]
Post subject:  Question regarding merging of PDF's

Taking the sample code and re-writing a good portion of it, i have managed to make a module that can merge any number of PDF's and bookmark them all. I have some questions though. These PDF's have no text, it's scanned images of text from legal documents. We all know that OCR software isn't 100% so i cant change from these images for legal reasons. If even one word is wrong then the whole document could be invalid. However, they are all scanned black and white but judging from the size of the original PDF's they are being saved as color images inside the PDF. When i am doing my merge, is there a way for me to save these images in monochrome to save space?

Edit: Forgot to mention i'm using the WPF version of PDFSharp.

Author:  Thomas Hoevel [ Wed Oct 27, 2010 7:33 am ]
Post subject:  Re: Question regarding merging of PDF's

With version 1.31 we added CCITT compression for black/white images. This normally provides better compression rates for scanned bitonal images.
Just make sure your images are bitonal and not greyscale.

All images that are not JPEG files will be converted to the PDF image format.
So for scanned greyscale files, you should save them as JPEG and they will be copied as they are.

Author:  Efflixi [ Wed Oct 27, 2010 12:45 pm ]
Post subject:  Re: Question regarding merging of PDF's

I think i need to clarify a little bit. I am working with pre-saved PDF's that i have no control over as to how they are generated. I am taking these PDF's and merging them. The PDF's are pure text but saved as color images, making the PDF's huge. During the process of my merge is someway for me to re-save them as bitonal?

Author:  Thomas Hoevel [ Wed Oct 27, 2010 1:15 pm ]
Post subject:  Re: Question regarding merging of PDF's

Sorry, I got you wrong.

The images in the PDF files are left untouched when you merge them.

Reducing image file size to shrink the PDF file is not my area of expertise.
But I think the Export Images sample can be used as a starting point, but many cases are not implemented yet.
Sometimes two filters are applied to one image (for example JPEG or CCITT images can be FlateEncoded), so your code must handle both filters to get at the image.

The better approach would be to optimize image size while scanning.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group