Thursday, March 25, 2010

Manipulating PDF files

A customer wanted a product that adds a watermark or stamp to a PDF document. My colleague Kim Chee found out that stamping and watermarking can easily be done with pdftk. In our product, we call the OS's pdftk to manipulate the pdf files.

To generate a dynamic stamp or watermark which includes the username and time, we use reportlab.

One hurdle was to read the PDF file (in order to write that to a temporary file on the filesystem). The PDF files in our product are contained in a custom content type, in an array field of files. In order to correctly read the PDF from a field, i had to call str(field).

Below is some code from our product's This file and the code framework was generated by ArchGenXML, where we created a workflow transition script.

##code-section module-header #fill in your manual code here
from DateTime import DateTime
import os, sys
from Products.CMFCore.utils import getToolByName
import tempfile
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.units import cm
##/code-section module-header


def approveDocument(obj, event):
    """generated workflow subscriber."""
    # do only change the code section inside this function.
    if not event.transition \
       or not in ['approve'] \
       or obj != event.object:
    ##code-section approveDocument #fill in your manual code here

    # create stamp file
    tmp = tempfile.mkdtemp()
    stamppath = "%s/stamp.pdf" % tmp
    canvas = Canvas(stamppath)
    canvas.setFillColorRGB(1,0.75,0, alpha=0.75)
    user_id = event.status.get('actor')
    mtool = getToolByName(obj, 'portal_membership')
    member = mtool.getMemberById(user_id)
    fullname = member.getProperty('fullname')
    now = DateTime()
    message = "Approved by %s [%s] on %s" % (fullname, user_id,
            now.strftime('%Y/%m/%d %H:%M') )
    canvas.drawString(1*cm, 1*cm, message)

    # iterate over documents
    fields = obj.getField('documents').Schema().fields()
    file_fields = [field for field in fields if field.type == 'file']
    for i, field in enumerate(file_fields):
        # check if document is a pdf
        content_type = field.getContentType(obj)
        if content_type != 'application/pdf':
            obj.plone_log("File in field %s is not a PDF, skipping." % field)

        # get raw file
        document = obj.getDocuments()[i]
        assert document.filename == field.getFilename(obj)
        file = str(field.getRaw(obj))

        # write pdf document to tmp file
        infile_name = tmp + '/infile.pdf'
        infile = open(infile_name, 'wb')

        # create stamped version of PDF
        outfile_name = tmp + '/outfile.pdf'
        command_line = 'pdftk %s stamp %s output %s' % (infile_name, 
                stamppath, outfile_name)
        args = shlex.split(command_line)
        stampresult = subprocess.Popen(args)
        # wait for stamping to finish before going further
        sts = os.waitpid(, 0)[1]

        # read stamped pdf from file
        outfile = open(outfile_name,'r')
        stamped =

        # replace existing PDF with stamped version
    # clean up
    command_line = 'rm -rf %s' % tmp
    args = shlex.split(command_line)
    p = subprocess.Popen(args)


    ##/code-section approveDocument

1 comment:

Matt Hamilton said...

Funnily enough, I was doing *exactly* the same thing today. I needed to add company letterhead to some generated PDFs and discovered pyPdf, which enables just this.

from pyPdf import PdfFileReader, PdfFileWriter
input1 = PdfFileReader(file("/temp/1269429735.pdf", "rb"))
background = PdfFileReader(file("/temp/letterhead.pdf", "rb"))
page1 = input1.getPage(1)
out = open('/temp/out.pdf', 'wb')
outpdf = PdfFileWriter()