PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sun Jul 14, 2024 7:53 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Tue Oct 26, 2010 5:59 pm 
Offline

Joined: Tue Oct 26, 2010 5:49 pm
Posts: 2
Taking the sample code and re-writing a good portion of it, i have managed to make a module that can merge any number of PDF's and bookmark them all. I have some questions though. These PDF's have no text, it's scanned images of text from legal documents. We all know that OCR software isn't 100% so i cant change from these images for legal reasons. If even one word is wrong then the whole document could be invalid. However, they are all scanned black and white but judging from the size of the original PDF's they are being saved as color images inside the PDF. When i am doing my merge, is there a way for me to save these images in monochrome to save space?


Edit: Forgot to mention i'm using the WPF version of PDFSharp.


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 27, 2010 7:33 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
With version 1.31 we added CCITT compression for black/white images. This normally provides better compression rates for scanned bitonal images.
Just make sure your images are bitonal and not greyscale.

All images that are not JPEG files will be converted to the PDF image format.
So for scanned greyscale files, you should save them as JPEG and they will be copied as they are.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 27, 2010 12:45 pm 
Offline

Joined: Tue Oct 26, 2010 5:49 pm
Posts: 2
I think i need to clarify a little bit. I am working with pre-saved PDF's that i have no control over as to how they are generated. I am taking these PDF's and merging them. The PDF's are pure text but saved as color images, making the PDF's huge. During the process of my merge is someway for me to re-save them as bitonal?


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 27, 2010 1:15 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
Sorry, I got you wrong.

The images in the PDF files are left untouched when you merge them.

Reducing image file size to shrink the PDF file is not my area of expertise.
But I think the Export Images sample can be used as a starting point, but many cases are not implemented yet.
Sometimes two filters are applied to one image (for example JPEG or CCITT images can be FlateEncoded), so your code must handle both filters to get at the image.

The better approach would be to optimize image size while scanning.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 27 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group