Hi,
I've been using the provided example
http://www.pdfsharp.net/wiki/ExportImages-sample.ashx to extract images from each page in a PDF. I've extended it to also support /FlateDecode (in the case where the colour space is RGB) and this is working fine (although I'd love to know how to handle cmyk since loads of our PDFs use it)
But I have a few PDF documents where some or all of the images are not detected AT ALL (i.e. no /XObject /Image items are detected when processing the PDF page by page, but the images are clearly there if you open in Adobe Reader). If I open the PDF in Notepad++ I can clearly see the /XObject /Image items, so I know they are present in the PDF.
So I approached the problem in a different manner. I used the "Internals" class to access "GetAllObjects()" and read through each object without a care about which page they were on. Code snippet below:
Code:
// Get a list of all objects
PdfObject[] arrPDFObjects = objPDFDocument.Internals.GetAllObjects();
if (arrPDFObjects != null) {
Console.WriteLine("Number of objects: " + arrPDFObjects.Length);
foreach (PdfObject objThisPDFObject in arrPDFObjects) {
PdfReference objThisPdfObjectReference = objThisPDFObject.Reference;
if (objThisPdfObjectReference != null) {
PdfDictionary xObject = objThisPdfObjectReference.Value as PdfDictionary;
// Is external object an image?
if (xObject == null) {
// Null value
} else if (xObject.Elements.GetString("/Subtype") == "/Image") {
Console.WriteLine("Image found. Id = " + objThisPdfObjectReference.ObjectID);
// Export the image
ExportImage(xObject, ref valImageCount);
}
}
}
} else {
Console.WriteLine("No objects");
}
So while I can extract the images, I don't understand why they aren't found when I process the PDF page by page (using the sample on your Wiki). These images are clearly on the page since you can see them in Adobe Reader (i.e. they aren't orphaned objects).
So I guess my question is:
Is there some other method by which an image can be on a page which isn't detected by the provided example and how should I be detecting these images?