PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Test if a file is valid
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3465
Page 1 of 1

Author:  ppower [ Fri Sep 23, 2016 2:11 pm ]
Post subject:  Test if a file is valid

I'm processing an huge list of PDF files and I need to understand which files are right and which are corrupted. In case I find some file corrupted, I can access a service which serves me the right file, and I substitute it in the list.
That recovery has a cost, in terms of processing power and bandwidth, so I would like to avoid as much as possible to access that service.
I need to process only the file corrupted, so I would need a function to invoke for each file in the list.

I would like to have an evaluation function based on some free library that tells me if the PDF file is good or corrupt, in the same way Acrobat Reader would show me any page or it displays an alert with the error.
My file could be corrupted for several reasons:

- it has been previously transferred with wrong encoding during a file transfer (FTP binary/ASCII)
- it could have been truncated by a previous and unstable FTP upload
- it could be wrong for unknown reasons

At this time I developed this solution, which is far from being optimal, due to unimplemented features in PdfSharp:

Code:
Dim doLoaded As Boolean
Dim tested As Integer
Dim numberOfPages As Integer
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing

Try
  ' quick'n dirty test
  tested = PdfSharp.Pdf.IO.PdfReader.TestPdfFile(file)
 
  ' open attempt with iTextSharp
  reader = New PdfReader(New RandomAccessFileOrArray(file, True), Nothing)
  numberOfPages = reader.NumberOfPages
  doLoaded = (tested <> 0) AndAlso (numberOfPages > 0)
  If (Not doLoaded) Then Return False
 
  ' open attempt with PdfSharp
  doc = PdfSharp.Pdf.IO.PdfReader.Open(file, PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)
  np = doc.PageCount
  doLoaded = (tested <> 0) AndAlso (np > 0)
  If doLoaded Then
  For i = 0 To np - 1
  a = doc.Pages.Item(i)
  Next
Catch ex As Exception
  If ex.Message = "Cannot handle iref streams. The current implementation of PDFsharp cannot handle this PDF feature introduced with Acrobat 6." Then
    doLoaded = True
  End If
Catch ex As Exception
  doLoaded = False
Finally
  If (Not reader Is Nothing) Then
    reader.Close()
  End If
End Try


Any suggestions?

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/