PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Compress an all-text PDF?
https://forum.pdfsharp.net/viewtopic.php?f=2&t=4253
Page 1 of 1

Author:  DotnetMe [ Wed May 19, 2021 1:39 pm ]
Post subject:  Compress an all-text PDF?

My app produces a really large all-text PDF and it seems like I should be able to get it much smaller.

If I take my example PDF of 176MB, 7zip will compress the file down to 4MB.

The PDF size is mostly the Content Stream. I have tested optimization using Adobe and many online services but they do not reduce the size by very little. I tested printing to a new PDF and that only increased the size.

The caveat is that the PDF is produced by another application and I have little control with how the PDF is produced.

Any suggestions on how I might greatly reduce the size?

Image

Author:  TH-Soft [ Wed May 19, 2021 2:07 pm ]
Post subject:  Re: Compress an all-text PDF?

Hi!
DotnetMe wrote:
Any suggestions on how I might greatly reduce the size?
I don't think opening and saving with PDFsharp will create a smaller file than Adobe Reader.

Not much can be said for sure without seeing the PDF.

Creating the PDF right from the start with PDFsharp would give you better control about the size.

Content streams can be "zipped" inside the PDF. Maybe the streams contain redundant information, allowing 7zip to gain much compressing across streams.

Author:  DotnetMe [ Wed May 19, 2021 2:23 pm ]
Post subject:  Re: Compress an all-text PDF?

TH-Soft wrote:
Hi!
DotnetMe wrote:
Any suggestions on how I might greatly reduce the size?
Content streams can be "zipped" inside the PDF. Maybe the streams contain redundant information, allowing 7zip to gain much compressing across streams.

So currently the Content Stream might not be compressed at all. Can PDFsharp replace a content Stream of an existing PDF?

Is there an example of compressing a Content Stream? Maybe I can then piece together replacing the existing stream with a compressed one.

The PDF's are created using Crystal Reports.. and are highly inefficient garbage.

I really appreciate the help.

Author:  DotnetMe [ Wed May 19, 2021 4:13 pm ]
Post subject:  Re: Compress an all-text PDF?

I am not yet sure how I can compress the Content Stream.

In my test I am creating a new PDF with compression options set, and then adding each page of the source PDF to the new PDF.

The compression settings seem to have no effect. The PDF files are the same size.

Code:
PdfDocument outputDocument = new PdfDocument();
outputDocument.Options.NoCompression = false;
outputDocument.Options.CompressContentStreams = true;
outputDocument.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;


UPDATE:

My testing method definitely works. I performed the same steps but with compression set to off for the new PDF. The new PDF was double the size. I then took the uncompressed PDF and can it again but with compression on. The new PDF was the same size as my original compressed PDF.

So I guess it comes down to the compression method being not that great for PDF's?

Maybe Flate is the best it can currently do. I compared the PDF's in a binary editor and I can see the portions being compressed. Though there seems to be so much room for improvement by Adobe for compressing PDF's.

Oh well... I guess further file size reduction is beyond what is available in the PDF spec.

Author:  rsoeung [ Sun Jun 06, 2021 4:45 am ]
Post subject:  Re: Compress an all-text PDF?

Could you provide an example of the pdf if that's ok?

Author:  DotnetMe [ Sun Jun 06, 2021 5:33 am ]
Post subject:  Re: Compress an all-text PDF?

rsoeung wrote:
Could you provide an example of the pdf if that's ok?


I'm not permitted to, but it is entirely text without any images. But 50% of the text on each page is repeated. Think legal document.

Author:  rsoeung [ Sun Jun 06, 2021 6:10 pm ]
Post subject:  Re: Compress an all-text PDF?

That's fine. Almost everything in the PDF can be compressed however currently PDFSharp doesn't offer that capability. I'm in the middle of writing an extension to PDFSharp to compress more than content streams. I'll post it in this thread after I complete this.

Author:  DotnetMe [ Mon Jun 07, 2021 3:56 pm ]
Post subject:  Re: Compress an all-text PDF?

rsoeung wrote:
That's fine. Almost everything in the PDF can be compressed however currently PDFSharp doesn't offer that capability. I'm in the middle of writing an extension to PDFSharp to compress more than content streams. I'll post it in this thread after I complete this.


Excellent... thanks!

Author:  DotnetMe [ Fri Oct 21, 2022 12:13 pm ]
Post subject:  Re: Compress an all-text PDF?

Just checking if there is any progress in adding the compression.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/