PDFsharp & MigraDoc Foundation • View topic - Compress images in an existing PDF?

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

Compress images in an existing PDF?

Moderator: Stefan Lange

Page 1 of 1

[ 11 posts ]

Print view

Previous topic | Next topic

Author

Message

hattonjohn

Post subject: Compress images in an existing PDF?

Posted: Thu Nov 03, 2016 9:08 pm

Joined: Thu Sep 29, 2011 1:39 pm
Posts: 5

Hi,
The open source windows/linux book-making software, bloomlibrary.org, uses an embedded Firefox to make pdfs. Unfortunately the resulting PDFs are huge because Firefox saves images using only FlatDecode (zipped), so the PDFs are huge. If I run the pdf through ghostscript, it gets nicely compressed again, marked as DCTDecode.

However, we want to keep our installer small, and we already ship with PDFSharp. We don't want to add ghostscript.

Before we dive into this, should it be feasible to open the PDF with PDFSharp, walk through each image, compress it (in c#), and then put back in the compressed version? Any advice on how to approach that?

thanks
jh

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Wed May 10, 2017 3:00 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

Has anyone ever answered this question? I need to do the same thing. I have working code already that converts any image into a JPEG at a certain quality (I'm using 50%), but I've seen this same question asked over and over and there is never a response. "This is not possible with PDFSharp" is a valid answer, if that is the answer, and is much more helpful than silence.

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 1:31 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

To ask a more specific question: Here is a function adapted from the oft-repeated sample code parsing through all of the images in a PDF. Assume that I have already converted the images to JPG (another issue that is a separate question regarding the ExportToImage function and non-JPG images), so all I have to do is read the replacement files from disk and insert them in the right places. For example, a 20-page document that is just scanned paper pages.

The following code does that, and applies the changes to the xObject.Elements that matches what is in the PDF if it was created with JPG images. However, when I open the resulting PDF, I get the message "An error exists on this page. Acrobat may not display the page correctly", and all the pages are blank. The PDF file size looks like it has the right data (it's reduced from 22 mb to 3mb, matching one converted through other desktop applications) but it will not display. I'm assuming there are other steps to correct either the xObject or a Resources entry. Any ideas?

Code:

private static void ProcessImagesPDFSharp()
{
   PdfDocument pdf = PdfReader.Open(@"test\test.pdf");

   int imageCount = 0;
   // Iterate pages
   foreach (PdfPage page in pdf.Pages)
   {
      // Get resources dictionary
      PdfDictionary resources = page.Elements.GetDictionary("/Resources");
      if (resources != null)
      {
         // Get external objects dictionary
         PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
         if (xObjects != null)
         {
            ICollection<PdfItem> items = xObjects.Elements.Values;
            // Iterate references to external objects
            foreach (PdfItem item in items)
            {
               PdfReference reference = item as PdfReference;
               if (reference != null)
               {
                  PdfDictionary xObject = reference.Value as PdfDictionary;
                  // Is external object an image?
                  if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                  {
                     // Replace this object with a JPG file
                     xObject.Stream.Value = File.ReadAllBytes($@"test\page {++imageCount}.jpg").ToArray();
                     xObject.Elements.SetValue("/Length", new PdfInteger(xObject.Stream.Value.Length));
                     xObject.Elements.SetValue("/ColorSpace", new PdfString("/DeviceRGB"));
                     xObject.Elements.SetValue("/Filter", new PdfString("/DCTDecode"));
                     xObject.Elements.SetValue("/Type", new PdfString("/XObject"));
                     xObject.Elements.Remove("/DecodeParams");
                  }
               }
            }
         }
      }
   }
   pdf.Save(@"test\out.pdf");
}

Top

Thomas Hoevel

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 1:53 pm

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3096
Location: Cologne, Germany

Hi!

phirewind wrote:

"This is not possible with PDFSharp" is a valid answer, if that is the answer, and is much more helpful than silence.

PDFsharp is open source and "This is not possible with PDFsharp" is hardly ever a valid answer.
There could be a rather simple solution for files from scanner "Foo 9100" while a general solution will be much more complicated.

phirewind wrote:

However, when I open the resulting PDF, I get the message "An error exists on this page. Acrobat may not display the page correctly"

There is an error in a PDF file which we do not see. So you cannot expect more than speculation from us.
Maybe some other properties are incorrect ("/Width" or "/Height" or something else).

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 2:07 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

I will send the file via pm/email, as I had to redact certain proprietary information from the scanned document.

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 3:16 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

I was able to sufficiently redact the samples and create a single-page test. The "in.pdf" in this case is actually significantly smaller than the "out.pdf", but the first relevant issue is the ability to replace one image with another and maintain PDF integrity.

and btw thanks in advance for any assistance. I know it helps to have a very specific question to answer.

Attachments:

File comment: Contains in.pdf, out.pdf, and page 1.jpg

sample.zip [175.89 KiB]
Downloaded 420 times

Top

hattonjohn

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 3:21 pm

Joined: Thu Sep 29, 2011 1:39 pm
Posts: 5

www.pdf-online.com says:

File out.pdf
Compliance pdf1.4
Result Document does not conform to PDF/A.
Details
Validating file "out.pdf" for conformance level pdf1.4
The value of the key Type must not be of type string.
The value of the key Type is (null) but must be XObject.
The value of the key ColorSpace must not be of type string.
The image's sample stream's computed length 1053150 is different to the actual length 118121.
The color space is invalid.
The document does not conform to the requested standard.
The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
Done.

Top

hattonjohn

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 3:34 pm

Joined: Thu Sep 29, 2011 1:39 pm
Posts: 5

I don't know if any of the above pdf/a check is actually relevant... another product, pdfharmony, said

page: 001 Could not find the XObject named 'Im1';

pdfHarmony reported no errors with your in.pdf.

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Thu May 11, 2017 3:44 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

The first check may be relevant. I also forgot to correct the image size on the rebuilt sample and had misspelled "/DecodeParms", but I corrected those and had the same issue remain.

When I'm using xObject.Elements.SetValue, it is storing values as objects with a Value property, but not the same type of object (it adds other properties). I checked the values as they were applied in debug step-through to verify. But the PdfItem object doesn't appear to have an instantiator, so I can't use "new PdfItem('value')"; I may be chasing red herrings there, and it looks like other people use .SetValue in the same manner successfully, but that is a slightly curious thing. I can also see that it is storing the correct value in the Length key, but there must be some calculation Acrobat performs with the given info that comes up with the wrong estimate and chokes.

Top

phirewind

Post subject: Re: Compress images in an existing PDF?

Posted: Fri May 26, 2017 2:05 pm

Joined: Wed May 10, 2017 2:35 pm
Posts: 8

So, no ideas how to replace an image in PDFSharp without corrupting the PDF?

Top

TH-Soft

Post subject: Re: Compress images in an existing PDF?

Posted: Fri May 26, 2017 6:32 pm

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 916
Location: CCAA

phirewind wrote:

So, no ideas how to replace an image in PDFSharp without corrupting the PDF?

As I understand it your code corrupts the PDF file by inserting incorrect values and/or values in incorrect formats.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

Page 1 of 1

[ 11 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: Bing [Bot] and 376 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum