PDFsharp & MigraDoc Forum • View topic - Extract image example fails on large images

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

Also see our new Tailored Support & Services site.

Extract image example fails on large images

Moderator: Stefan Lange

Page 1 of 1

[ 10 posts ]

Print view

Previous topic | Next topic

Author

Message

vbcrlfuser

Post subject: Extract image example fails on large images

Posted: Wed May 19, 2010 2:52 pm

Joined: Wed May 19, 2010 2:40 pm
Posts: 5

Apologies if this has been dealt with before. I searched the forum for 'large image' 'split image' and other key words describing what I think I am dealing with but did not find an answer.

Need to extract images from a PDF file. Used the PdfSharp Extract Image example as a starting point. Even added the ability to extract TIFF files with the use of libtiff.

Things were working great. Until a large image was encountered. It is packed differently in the PDF.

The smaller image is packed in the PDF (by scanner software) like this...

<</Type/XObject
/Subtype/Image
/Width 2344
/Height 1654
/BitsPerComponent 1
/ColorSpace/DeviceGray
/Filter /CCITTFaxDecode
/DecodeParms <</Columns 2344 /Rows 1654>>
/Length 22493

The larger image is packed in the PDF (by scanner software) like this...

<</Type/XObject
/Subtype/Image
/Width 2344
/Height 1654
/BitsPerComponent 1
/ColorSpace/DeviceGray
/Decode[1 0]
/Length 484622

Because the /Filter tag is missing the PdfSharp Extract Image code of course fails to decode the image. And I noticed that /DecodeParams is replaced with /Decode[1 0] which I take to mean this large image has been broken in to two smaller objects in position 1 and 0 of some sub part. Can anyone lend a hand here?

It's like after parsing the /Subtype/Image token another step needs to be done to inspect if /DecodeParams or /Decode[1 0] is present. And if it is to drop down one more loop to collect the data. But I don't know how to piece the sub parts together.

Thanks!

Top

vbcrlfuser

Post subject: Re: Extract image example fails on large images

Posted: Wed May 19, 2010 3:01 pm

Joined: Wed May 19, 2010 2:40 pm
Posts: 5

Wait a minute... looking at some other posts please do not tell me if the /Filter is missing one must then look at the /Colorspace and the /Decode and decode this yourself?

Should I take this to mean...

<</Type/XObject
/Subtype/Image
/Width 2344
/Height 1654
/BitsPerComponent 1
/ColorSpace/DeviceGray
/Decode[1 0]
/Length 484622

This is a black and white image, 1 bit per pixel, and the values are 0 = black, 1 = white in one large contiguous block of data?

Top

Thomas Hoevel

Post subject: Re: Extract image example fails on large images

Posted: Thu May 20, 2010 7:22 am

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3131
Location: Cologne, Germany

Hi!

"/Decode[1 0]" simply inverts the image.
There is only one part of the image.

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

vbcrlfuser

Post subject: Re: Extract image example fails on large images

Posted: Thu May 20, 2010 7:23 pm

Joined: Wed May 19, 2010 2:40 pm
Posts: 5

Ok so the problem now is purely a graphical one of pulling pixels and stuffing them in to a Bitmap. Something beyond my skill set. And yes off topic but I'm hoping someone can help. Thanks!

This gets close but each successive row in the image appears offset. So I know it is word alignment padding or stride or some such with Bitmaps. Just not sure how to calculate.

Code:

static void ExportBitmapImage(PdfDictionary image)
{

   int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
   int width = image.Elements.GetInteger(PdfImage.Keys.Width);
   int height = image.Elements.GetInteger(PdfImage.Keys.Height);
   byte[] pixels = image.Stream.Value;
   
   if (bitsPerComponent == 1) {

      Bitmap bmp = new Bitmap(width, height,PixelFormat.Format1bppIndexed);

      BitmapData bmpData = bmp.LockBits(new Rectangle(0,0,width,height, ImageLockMode.ReadWrite, bmp.PixelFormat);

      ' please help here converting contiguous byte array in to what bitmap wants or requires

      Marshal.Copy(pixels, 0, bmpData.Scan0, pixels.Length);
     
      bmp.UnlockBits(bmpData);
         
      bmp.Save("not-quite-right.bmp");

   }

}

Top

JeffJohnson

Post subject: Re: Extract image example fails on large images

Posted: Fri May 21, 2010 2:19 pm

Joined: Thu Feb 25, 2010 2:44 pm
Posts: 14

My co-worker found a solution on StackOverflow, although the code posted there was not perfect because it assumed it would work for all images when in fact it only works for monochrome (hence my comment in the switch block):

Code:

// Assume you've already obtained the image dictionary
// in the variable 'xObject'
string filter = xObject.Elements.GetName(PdfImage.Keys.Filter);

switch (filter)
{
   // ...
   // Other cases omitted for clarity
   // ...
   case "/FlateDecode":
      byte[] raw = Filtering.FlateDecode.Decode(xObject.Stream.Value);
      int width = xObject.Elements.GetInteger(PdfImage.Keys.Width);
      int height = xObject.Elements.GetInteger(PdfImage.Keys.Height);
      int bitsPerComponent = xObject.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
      PixelFormat pixelFormat;
      
      switch (bitsPerComponent)
      {
         case 1:
            pixelFormat = PixelFormat.Format1bppIndexed;
            break;
         case 8:
            // TODO: The Marshal.Copy code below will only work with monochrome
            // bitmaps, so color bitmaps need to be handled differently
            // (By the way, PDFsharp forum, I have written code to handle this,
            // at least for non-transparent color images. I'll post it once it
            // handles transparency too.)
            pixelFormat = PixelFormat.Format24bppRgb;
            break;
         default:
            throw new Exception(String.Format("Unknown pixel format {0}.", bitsPerComponent));
      }
      
      Bitmap bitmap = new Bitmap(width, height, pixelFormat);
      BitmapData bitmapData = bitmap.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
      Marshal.Copy(raw, 0, bitmapData.Scan0, raw.Length);
      bitmap.UnlockBits(bitmapData);
      using (MemoryStream imageStream = new MemoryStream())
      {
         bitmap.Save(imageStream, ImageFormat.Jpeg);
         // Do something useful with imageStream
      }
      break;
}

It worked one the one or two tests I put it through. Let me know if it works for you.

Top

vbcrlfuser

Post subject: Re: Extract image example fails on large images

Posted: Fri May 21, 2010 3:02 pm

Joined: Wed May 19, 2010 2:40 pm
Posts: 5

For the critical parts that is the same code as posted. Any other ideas?

Was this code operational on any PDF? Or by any chance just against PDFs where the streams in question contained raw pixels from Bitmaps that were Format1BppIndexed?

The reason I asked is the following code produces the correct image using the same concept only against libtiff using raw format and not C# Bitmap class. So I'm wondering if this is a problem with Bitmaps and Format1BppIndexed not supporing this operation?

Code:

byte[] pixels = xobject.Stream.Value;

int tif = TIFFOpen("c:\\example.tif", "w");

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.IMAGEWIDTH, (uint)width);

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.IMAGEHEIGHT, (uint)height);

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.COMPRESSION, 
                      (uint)BitMiracle.LibTiff.Classic.Compression.NONE);

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.PHOTOMETRIC 
                      (uint)BitMiracle.LibTiff.Classic.Photometric.MINISWHITE);

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.BITSPERSAMPLE, 
                      (uint)bitsPerComponent);

TIFFSetField(tif, (uint)BitMiracle.LibTiff.Classic.TiffTag.SAMPLESPERPIXEL, 1);

IntPtr pointer = Marshal.AllocHGlobal(pixels.length);

Marshal.Copy(pixels, 0, pointer, pixels.length);

TIFFWriteRawStrip(tif, 0, pointer, pixels.length);

TIFFClose(tif);

Again I'm new to this but here are the tags for the stream in question. And there is no /Filter tag so this is just raw pixels right? 1 bit per pixel inverting black and white?

<</Type/XObject
/Subtype/Image
/Width 2344
/Height 1654
/BitsPerComponent 1
/ColorSpace/DeviceGray
/Decode[1 0]
/Length 484622

Top

vbcrlfuser

Post subject: Re: Extract image example fails on large images

Posted: Fri May 21, 2010 3:09 pm

Joined: Wed May 19, 2010 2:40 pm
Posts: 5

Attached is a screen shot of the output when using the Bitmap, LockBit, Marshal.Copy approach. Sorry cannot send the original it contains information of a sensitive nature. You can see it looks like a stride / padding type issue.

Attachments:

File comment: example output with Bitmap LockBit Marshal.Copy method

screenshot-small.png [ 62.85 KiB | Viewed 20151 times ]

Top

Soldier-B

Post subject: Re: Extract image example fails on large images

Posted: Fri May 21, 2010 3:55 pm

Joined: Tue Oct 14, 2008 6:15 pm
Posts: 32
Location: USA

If the image is a bitmap then the pixel data would be stored from the bottom to the top, left to right...which could explain the funky output.

http://en.wikipedia.org/wiki/BMP_file_format

Top

JeffJohnson

Post subject: Re: Extract image example fails on large images

Posted: Fri May 21, 2010 6:27 pm

Joined: Thu Feb 25, 2010 2:44 pm
Posts: 14

Aha! The image I was testing just happened to have the right width to cause each scan line to naturally end on a 32-bit boundary (i.e., each line had a multiple of 4 bytes). When I added 4 pixels to the width to change this, the resulting extracted bitmap was shifted, just like yours. This happened regardless of whether the image was FlateDecoded or stored uncompressed. (I tested this because I was pretty sure it made no difference whether the image was compressed or not.)

So it looks like the direct copy method is a bust 75% of the time and you need to write code to pad out each scan line to a 4-byte boundary. I guess I really should get around to implementing my ExtractIndexedImageGrayscale() method, huh?

And no, Soldier-B, it has nothing to do with bottom-to-top; it's all about the 4-byte boundary padding.

Top

Thomas Hoevel

Post subject: Re: Extract image example fails on large images

Posted: Tue May 25, 2010 8:07 am

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3131
Location: Cologne, Germany

With PDF bitmaps, each row is padded to a BYTE boundary (multiple of 8 bits).
With Windows bitmaps, each row is padded to a DWORD boundary (multiple of 32 bits).

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

Page 1 of 1

[ 10 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: No registered users and 33 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum