PDFsharp & MigraDoc Forum https://forum.pdfsharp.net/ |
|
extrace image from pdf and save as png https://forum.pdfsharp.net/viewtopic.php?f=2&t=1621 |
Page 1 of 1 |
Author: | ineedhelp [ Fri Apr 15, 2011 8:30 am ] |
Post subject: | extrace image from pdf and save as png |
I'm trying to extract (the orginal) png image from a pdf, using pdfsharp, but am having problems. I based myself on the code in the following link, but am having problems with the ExportAsPngImage function. http://www.pdfsharp.net/wiki/Default.as ... eSupport=1 This is the code i use for the extraction and saving off the png: Code: static void ExportAsPngImage(PdfDictionary image, ref int count) { int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width); int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height); int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent); System.Drawing.Imaging.PixelFormat pixelFormat = new System.Drawing.Imaging.PixelFormat(); switch (bitsPerComponent) { case 1: pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed; break; case 8: pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed; break; case 24: pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb; break; default: throw new Exception("Unknown pixel format " + bitsPerComponent); } Bitmap bmp = new Bitmap(width, height, pixelFormat); PdfSharp.Pdf.Filters.FlateDecode fd = new PdfSharp.Pdf.Filters.FlateDecode(); byte[] arr = fd.Decode(image.Stream.Value); System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat); System.Runtime.InteropServices.Marshal.Copy(arr, 0, bmd.Scan0, arr.Length); bmp.UnlockBits(bmd); bmp.Save("c:\\bmp1.png", System.Drawing.Imaging.ImageFormat.Png); } Result is that I'm getting an image, but it's a really weird one: ![]() Any help would be really appriciated! Thanks in advance |
Author: | Thomas Hoevel [ Mon Apr 18, 2011 11:28 am ] |
Post subject: | Re: extrace image from pdf and save as png |
ineedhelp wrote: Result is that I'm getting an image, but it's a really weird one: Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary. Most images have a width that is a multiple of 4, so there is no problem with them. You must copy the image data line by line and start each line at the DWORD boundary. |
Author: | ineedhelp [ Tue Apr 19, 2011 1:04 pm ] |
Post subject: | Re: extrace image from pdf and save as png |
thx Thomas, that works nicely that way. Now i'm still having one issue: i save the file as a bitmap, though the color palette is wrong. Is there any way i can get the color palette used for the image in the pdf using pdfsharp? |
Author: | Thomas Hoevel [ Tue Apr 19, 2011 1:46 pm ] |
Post subject: | Re: extrace image from pdf and save as png |
IIRC the colour palette is a separate object in the PDF file. So you have to find the colour palette and merge it into the BMP file. |
Author: | ineedhelp [ Wed Apr 20, 2011 6:52 am ] |
Post subject: | Re: extrace image from pdf and save as png |
Thx Thomas for the assistance, but I can't find the color palette ![]() Nor in the image itself (it's not colorspace is it?), nor in the pdf itself. So I'm still getting "wrong" colors in my output image ![]() This is the code i use to change dword-byte array (trick is that I put it in a 2dimensional arrary first, with ending caracters and then reput it in a 1dimensional array) in case someone needs it: Code: static void ExportAsPngImage(PdfDictionary image, ref int count)
{ int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width); int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height); int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent); System.Drawing.Imaging.PixelFormat pixelFormat = new System.Drawing.Imaging.PixelFormat(); PdfSharp.Pdf.PdfArray arr = image.Elements.GetArray(PdfSharp.Pdf.Advanced.PdfImage.Keys.ColorSpace); switch (bitsPerComponent) { case 1: pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed; break; case 8: pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed; break; case 24: pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb; break; default: throw new Exception("Unknown pixel format " + bitsPerComponent); } PdfSharp.Pdf.Filters.FlateDecode fd = new PdfSharp.Pdf.Filters.FlateDecode(); byte[] origineel_byte_boundary = image.Stream.UnfilteredValue;//fd.Decode(image.Stream.Value); byte[] resultaat_byte_boundary = null; int new_width = width; int allignment = 4; if (new_width % allignment != 0) //Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary. //Most images have a width that is a multiple of 4, so there is no problem with them. //You must copy the image data line by line and start each line at the DWORD boundary. { while (new_width % allignment != 0) new_width++; byte[,] copy_dword_boundary = new byte[height, new_width]; for (int y = 0; y < height; y++) { for (int x = 0; x < new_width; x++) { if (x <= width && (x + (y * width) != origineel_byte_boundary.Length ) ) // while not at end of line, take orignale array copy_dword_boundary[y, x] = origineel_byte_boundary[x + (y * width)]; else //fill new array with ending 0 copy_dword_boundary[y, x] = 0; } } resultaat_byte_boundary = new byte[new_width * height]; int counter = 0; for (int x = 0; x < copy_dword_boundary.GetLength(0); x++) { for (int y = 0; y < copy_dword_boundary.GetLength(1); y++) { //put 2dim array back in 1dim array resultaat_byte_boundary[teller] = copy_dword_boundary[x, y]; counter ++; } } } else { resultaat_byte_boundary = new byte[new_width * height]; origineel_byte_boundary.CopyTo(resultaat_byte_boundary, 0); } System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, new_width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat); System.Runtime.InteropServices.Marshal.Copy(resultaat_byte_boundary, 0, bmd.Scan0, resultaat_byte_boundary.Length); bmp.UnlockBits(bmd); using (FileStream fs = new FileStream("C:\\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write)) { bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png); } |
Author: | Thomas Hoevel [ Wed Apr 20, 2011 11:53 am ] |
Post subject: | Re: extrace image from pdf and save as png |
The colour palette is referenced by the "/ColorSpace" entry of the image. |
Author: | chris31415 [ Thu Aug 20, 2015 8:45 pm ] |
Post subject: | Re: extract image from pdf and save as png |
Well... I tried a VB adaptation of this code and got three difficulties: 1 - resultaat_byte_boundary = new byte[new_width * height]; //-> this causes to throw an exception as destination is told to be "not long enough", therefore I changed it to resultaat_byte_boundary = new byte[origineel_byte_boundary.Length]; 2 - resultaat_byte_boundary[teller] = copy_dword_boundary[x, y]; // no "teller" variable was defined anywhere... changed it by "counter" 3 - and in fact I cannot get any result: the best I had was a crash of the program after the save of the bitmap, and the generated bitmap was completely scrambled. I changed my mind and tried to treat directly image.Stream.UnfilteredValue. But now there are some other kind of difficulties: where to find some documentation about the meaning of that stuff ? How do I compute each pixel's color for the variouses cases of PNG format with that raw stream of data image.Stream.UnfilteredValue ? I managed in having the real datas for some cases, but not all: sometimes I get bad colors, sometimes I get a scambled bitmap... Where's the appropriate documentation ? Please help !!! |
Author: | TH-Soft [ Fri Aug 21, 2015 6:43 am ] |
Post subject: | Re: extract image from pdf and save as png |
chris31415 wrote: Where's the appropriate documentation? Try Adobe:http://www.adobe.com/devnet/pdf/pdf_ref ... chive.html http://www.adobe.com/devnet/pdf/pdf_reference.html The 1.4 version should contain all you need. PDF files contain PDF images. Conversion to Windows BMP is rather simple. The former is BYTE aligned, the latter is DWORD aligned and IIRC you have to swap top and bottom lines. Not sure, but maybe you also have to swap R and G colour components. There are several filters. And multiple filters can be applied to a single stream. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |