PDFsharp & MigraDoc Forum

PDFsharp - A .NET library for processing PDF & MigraDoc - Creating documents on the fly
It is currently Sat Sep 13, 2025 10:49 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules

Also see our new Tailored Support & Services site.



Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Fri Apr 15, 2011 8:30 am 
Offline

Joined: Fri Apr 15, 2011 8:23 am
Posts: 3
I'm trying to extract (the orginal) png image from a pdf, using pdfsharp, but am having problems.
I based myself on the code in the following link, but am having problems with the ExportAsPngImage function.
http://www.pdfsharp.net/wiki/Default.as ... eSupport=1

This is the code i use for the extraction and saving off the png:
Code:
static void ExportAsPngImage(PdfDictionary image, ref int count)
        {
            int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width);
            int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height);
            int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent);
            System.Drawing.Imaging.PixelFormat pixelFormat = new System.Drawing.Imaging.PixelFormat();
            switch (bitsPerComponent)
            {   
                case 1:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                    break;
                case 8:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                    break;
                case 24:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                    break;
                default:
                    throw new Exception("Unknown pixel format " + bitsPerComponent);
            }
            Bitmap bmp = new Bitmap(width, height, pixelFormat);
            PdfSharp.Pdf.Filters.FlateDecode fd = new PdfSharp.Pdf.Filters.FlateDecode();
            byte[] arr = fd.Decode(image.Stream.Value);
            System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat);
            System.Runtime.InteropServices.Marshal.Copy(arr, 0, bmd.Scan0, arr.Length);
            bmp.UnlockBits(bmd);
            bmp.Save("c:\\bmp1.png", System.Drawing.Imaging.ImageFormat.Png);
        }

Result is that I'm getting an image, but it's a really weird one:
Image]

Any help would be really appriciated!
Thanks in advance


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 18, 2011 11:28 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3135
Location: Cologne, Germany
ineedhelp wrote:
Result is that I'm getting an image, but it's a really weird one:

Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary.
Most images have a width that is a multiple of 4, so there is no problem with them.

You must copy the image data line by line and start each line at the DWORD boundary.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 19, 2011 1:04 pm 
Offline

Joined: Fri Apr 15, 2011 8:23 am
Posts: 3
thx Thomas, that works nicely that way.
Now i'm still having one issue: i save the file as a bitmap, though the color palette is wrong.
Is there any way i can get the color palette used for the image in the pdf using pdfsharp?


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 19, 2011 1:46 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3135
Location: Cologne, Germany
IIRC the colour palette is a separate object in the PDF file. So you have to find the colour palette and merge it into the BMP file.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 20, 2011 6:52 am 
Offline

Joined: Fri Apr 15, 2011 8:23 am
Posts: 3
Thx Thomas for the assistance, but I can't find the color palette :oops:
Nor in the image itself (it's not colorspace is it?), nor in the pdf itself.
So I'm still getting "wrong" colors in my output image :cry:

This is the code i use to change dword-byte array (trick is that I put it in a 2dimensional arrary first, with ending caracters and then reput it in a 1dimensional array) in case someone needs it:
Code:
        static void ExportAsPngImage(PdfDictionary image, ref int count)
        {
            int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width);
            int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height);
            int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent);
            System.Drawing.Imaging.PixelFormat pixelFormat = new System.Drawing.Imaging.PixelFormat();
            PdfSharp.Pdf.PdfArray arr = image.Elements.GetArray(PdfSharp.Pdf.Advanced.PdfImage.Keys.ColorSpace);
            switch (bitsPerComponent)
            {
                case 1:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                    break;
                case 8:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                    break;
                case 24:
                    pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                    break;
                default:
                    throw new Exception("Unknown pixel format " + bitsPerComponent);
            }

            PdfSharp.Pdf.Filters.FlateDecode fd = new PdfSharp.Pdf.Filters.FlateDecode();
            byte[] origineel_byte_boundary = image.Stream.UnfilteredValue;//fd.Decode(image.Stream.Value);
            byte[] resultaat_byte_boundary = null;
            int new_width = width;
            int allignment = 4;

            if (new_width % allignment != 0)
            //Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary.
            //Most images have a width that is a multiple of 4, so there is no problem with them.
            //You must copy the image data line by line and start each line at the DWORD boundary.
            {
                while (new_width % allignment != 0)
                    new_width++;
                byte[,] copy_dword_boundary = new byte[height, new_width];

                for (int y = 0; y < height; y++)
                {
                    for (int x = 0; x < new_width; x++)
                    {
                        if (x <= width && (x + (y * width) != origineel_byte_boundary.Length ) )
                            // while not at end of line, take orignale array
                            copy_dword_boundary[y, x] = origineel_byte_boundary[x + (y * width)];
                        else //fill new array with ending 0
                            copy_dword_boundary[y, x] = 0;
                    }
                }
                resultaat_byte_boundary = new byte[new_width * height];
                int counter = 0;
                for (int x = 0; x < copy_dword_boundary.GetLength(0); x++)
                {
                    for (int y = 0; y < copy_dword_boundary.GetLength(1); y++)
                    {   //put 2dim array back in 1dim array
                        resultaat_byte_boundary[teller] = copy_dword_boundary[x, y];
                        counter ++;
                    }
                }
            }
            else
            {
                resultaat_byte_boundary = new byte[new_width * height];
                origineel_byte_boundary.CopyTo(resultaat_byte_boundary, 0);
            }

            System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, new_width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat);
            System.Runtime.InteropServices.Marshal.Copy(resultaat_byte_boundary, 0, bmd.Scan0, resultaat_byte_boundary.Length);
            bmp.UnlockBits(bmd);
            using (FileStream fs = new FileStream("C:\\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
            {
                bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
            }


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 20, 2011 11:53 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3135
Location: Cologne, Germany
The colour palette is referenced by the "/ColorSpace" entry of the image.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 20, 2015 8:45 pm 
Offline

Joined: Thu Aug 20, 2015 8:34 pm
Posts: 1
Well... I tried a VB adaptation of this code and got three difficulties:
1 - resultaat_byte_boundary = new byte[new_width * height]; //-> this causes to throw an exception as destination is told to be "not long enough", therefore I changed it to
resultaat_byte_boundary = new byte[origineel_byte_boundary.Length];

2 - resultaat_byte_boundary[teller] = copy_dword_boundary[x, y]; // no "teller" variable was defined anywhere... changed it by "counter"

3 - and in fact I cannot get any result: the best I had was a crash of the program after the save of the bitmap, and the generated bitmap was completely scrambled.


I changed my mind and tried to treat directly image.Stream.UnfilteredValue.
But now there are some other kind of difficulties: where to find some documentation about the meaning of that stuff ? How do I compute each pixel's color for the variouses cases of PNG format with that raw stream of data image.Stream.UnfilteredValue ?

I managed in having the real datas for some cases, but not all: sometimes I get bad colors, sometimes I get a scambled bitmap... Where's the appropriate documentation ?

Please help !!!


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 6:43 am 
Offline
PDFsharp Guru
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 1054
Location: CCAA
chris31415 wrote:
Where's the appropriate documentation?
Try Adobe:
http://www.adobe.com/devnet/pdf/pdf_ref ... chive.html
http://www.adobe.com/devnet/pdf/pdf_reference.html

The 1.4 version should contain all you need.
PDF files contain PDF images. Conversion to Windows BMP is rather simple. The former is BYTE aligned, the latter is DWORD aligned and IIRC you have to swap top and bottom lines. Not sure, but maybe you also have to swap R and G colour components.

There are several filters. And multiple filters can be applied to a single stream.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 76 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group