PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Generate PDF from any file uploaded by user
https://forum.pdfsharp.net/viewtopic.php?f=2&t=1412
Page 1 of 1

Author:  nidhi.vithlani [ Tue Nov 09, 2010 10:14 am ]
Post subject:  Generate PDF from any file uploaded by user

Hello,

I need to generate PDF from any file uploaded by the user of application may it be txt, doc, docx, xls, xlsx, jpg, gif, etc.

Can I achieve this using PDFsharp and/or MigraDoc?

Any help for this is much appreciated. Thanks..

Author:  mikesowerbutts [ Wed Nov 10, 2010 1:07 pm ]
Post subject:  Re: Generate PDF from any file uploaded by user

Hi,

so long as you can extract data from the uploaded file, you can put that data into a pdf. I.e with a .txt file it would be simple:
-read text from the uploaded FileStream (or save the filestream, then open it and read the text) into a string
-create new pdfsharp/migradoc objects
-add text
-render pages to pdf (if u used migradoc)
-save pdf/output to browser

with regards to simple converting any file format into pdf, its not really that simple. I would allow a particular set of filetypes to be uploaded, and then specify rules for extracting data from each. image files can be directly turned into pdfs, but they MUST be saved on the server first as images cant be added to a pdf from a memory stream. they have to exist on the filesystem the code is running from.

If you need it to be truely anyformat, then I would suggest having something like CutePDF installed on the users machines to "print to pdf" (rather than pieces of paper on a printer, a pdf file is created instead)... then the native applications they used to create/view the files and CutePDF can do the work.

Hope this helps,

Mike

Author:  nidhi.vithlani [ Thu Nov 18, 2010 6:24 am ]
Post subject:  Re: Generate PDF from any file uploaded by user

Hi mikesowerbutts,

I am trying with following code:

Code:
    private void Test()
    {
        string sourceFileName = @"C:\My_Files\Test.txt";
        //string sourceFileName = @"C:\My_Files\Test.doc";
        //string sourceFileName = @"C:\My_Files\Test.xlsx";

        StringBuilder FileTextBuilder = new StringBuilder();
        using (FileStream ReadStream = File.OpenRead(sourceFileName))
        {
            byte[] DataTransit = new byte[ReadStream.Length + 1];
            UTF8Encoding DataEncoding = new UTF8Encoding(true);
            while (ReadStream.Read(DataTransit, 0, DataTransit.Length) > 0)
            {
                FileTextBuilder.Append(DataEncoding.GetString(DataTransit));
            }
        }

        string sourceContent = FileTextBuilder.ToString();

        PdfDocument document = new PdfDocument();
        document.Info.Title = "Created with PDFsharp";

        PdfPage page = document.AddPage();
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XFont font = new XFont("Verdana", 10, XFontStyle.Regular);

        gfx.DrawString(sourceContent, font, XBrushes.Black,
          new XRect(0, 0, page.Width, page.Height),
          XStringFormats.TopLeft);

        string destinationFileName = @"C:\My_Files\Test.pdf";
        document.Save(destinationFileName);
        document.Close();
    }


Now the problem is that when I try to read a well-formatted text file, the resultant PDF file is not in the same formatting, as it is expecting STRING parameter and not BYTE[]. In case of a doc file, junk characters are written in the PDF.

Let me know, if this is the approach you suggested, or something else.

Thank you.

Author:  mikesowerbutts [ Thu Nov 18, 2010 12:30 pm ]
Post subject:  Re: Generate PDF from any file uploaded by user

Hi,

I would suggest a more specific way of extracting data for each filetype, something more like this:
Code:
private void Test()
    {
        string sourceFileName = @"C:\My_Files\Test.txt";
        //string sourceFileName = @"C:\My_Files\Test.doc";
        //string sourceFileName = @"C:\My_Files\Test.xlsx";

        StringBuilder FileTextBuilder = new StringBuilder();
        string TBFormatted  = "";
    if(sourceFileName.Contains(".txt"){
        using (FileStream ReadStream = File.OpenRead(sourceFileName))
        {
            byte[] DataTransit = new byte[ReadStream.Length + 1];
            UTF8Encoding DataEncoding = new UTF8Encoding(true);
            while (ReadStream.Read(DataTransit, 0, DataTransit.Length) > 0)
            {
                FileTextBuilder.Append(DataEncoding.GetString(DataTransit));
            }
            //read in your .txt file to a string and format it. Pdfsharp/MigraDoc like to use /r for new line characters, whereas other formatters use /n and sometimes /r/n, it will depend on the content
            TBFormatted = FormatTXT(TextBuilder.ToString());
        }
     }
     if(sourceFileName.Contains(".doc"){
                 //read in your .doc file to a string and format it
          TBFormatted = ExtractDOC(sourceFileName);
      }
      else if(sourceFileName.Contains(".xlsx"){
           //read in your .xlsx file to a string and format it. 
          TBFormatted = ExtractXLSX(sourceFileName);
      }

        string sourceContent = TBFormatted;
        PdfDocument document = new PdfDocument();
        document.Info.Title = "Created with PDFsharp";

        PdfPage page = document.AddPage();
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XFont font = new XFont("Verdana", 10, XFontStyle.Regular);

        gfx.DrawString(sourceContent, font, XBrushes.Black,
          new XRect(0, 0, page.Width, page.Height),
          XStringFormats.TopLeft);

        string destinationFileName = @"C:\My_Files\Test.pdf";
        document.Save(destinationFileName);
        document.Close();
    }
private string FormatTXT(string input){
    return something;
}
private string ExtractDOC(string input){
    return something;
}
private string ExtractXLSX(string input){
    return something;
}


That code probably isnt 100% correct as ive not tested it and im slightly unfamiliar with the way your reading the test from the File, but hopefully you get the idea.

I read text in like this:
Code:
public static string LoadFile(string filePath)
        {
            TextReader tr = new StreamReader(filePath);
            string _fileString = tr.ReadToEnd();
            tr.Close();
            tr.Dispose();
            return _fileString;
        }


I think that as the formats are different, then you will have to extract the data from them in different ways. .txt shouldnt be too much of an issue, but for .doc, .xls look at the Office 2003 format, and for .docx and .xlsx look at office 2007 format and Open XML (as this is the format used to save Office 2007+ files) Open XML has its own set of objects for interpreting office 2007 files (and also OpenOffice.org i believe) - look at http://openxmldeveloper.org for more info.
Becuase different fileformats store data in different ways, this is why I think you should limit the number of possible upload formats as you will have to address each one seperately...

Hope this helps,

Mike

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/