How to Generate PDF Files in Python with Xhtml2pdf, WeasyPrint or Unoconv

Programmatic generation of PDF files is a frequent task when developing applications that can export reports, bills, or questionnaires. In this article, we will consider three common tools for creating PDFs, including their installation and converting principles.

PDF stands for Portable Document Format and it was originally developed by Adobe, though, it has now become an open standard for text files. Creating a single PDF file from a Microsoft Word document can be easily done through the Word’s menu, print dialogue in Linux or MacOS, or Adobe Acrobat Reader. At the same time, when you need to generate tens, hundreds, or even thousands of PDF files, it is better to automate this task. For generating PDFs with Python, we have chosen the following solutions: Xhtml2pdf, Weasyprint, and Unoconv.

How to Generate PDF in Python with Xhtml2pdf

The main drawback of all HTML to PDF converters is that the latter has numerous aspects that are absent in HTML, including, for example, page size. Xhtml2pdf deals with this problem by adding specific markup tags that allow solving various tasks, such as converting headers and footers on all pages. Xhtml2pdf is a CSS/HTML to PDF generator/converter and Python library that can be used in any Python framework such as Django.

In fact, to create PDFs, the tool uses ReportLab, a common open source Python library for generating PDF files. ReportLab uses an XML-based markup language called Requirements Modelling Language (RML). Therefore, you can think of Xhtml2pdf as another markup language for the ReportLab library.

Installation

You can easily install Xthml2pdf using a common command:

pip install xhtml2pdf

Generating PDFs

Xhtml2pdf usage example:

from xhtml2pdf import pisa
import cStringIO as StringIO

from django.template.loader import get_template
from django.template import Context


def html_to_pdf_directly(request):
	template = get_template("template_name.html")
	context = Context({"pagesize": "A4"})
	html = template.render(context)
	result = StringIO.StringIO()
	pdf = pisa.pisaDocument(StringIO.StringIO(html), dest=result)
	
	if not pdf.err:
		return HttpResponse(result.getvalue(), content_type="application/pdf")
	else:
		return HttpResponse("Errors")

You should create a view that would convert a PDF file from HTML. To temporarily store your PDF, use the cStringIO library that will provide an efficient file-like object interface. Set an output HTML file that has to be converted into PDF and the one which will receive the final result. Then you should set the HttpResponse object that has proper headers and enter the command that will get a value from the StringIO buffer and show it as a response.

How to Convert HTML to PDF in Python with WeasyPrint

WeasyPrint is another visual rendering engine that can export HTML/CSS content to PDF. Its main focus is to support web standards for further printing. WeasyPrint is a free tool that is available to download and use under a BSD license. The solution uses different libraries, but it is not based on a particular rendering engine, such as Gecko or Webkit. WeasyPrint’s CSS layout engine is based on Python and it supports 2.7 and 3.3 or higher Python versions. It is worth noting that this engine is created specifically for pagination tasks. In addition, WeasyPrint supports Django.

Features

The tool supports common HTML and CSS files, data URIs, FTP, and HTTP. It means that WeasyPrint supports HTTP redirects, but it does not support more complex features, such as authentication or cookies. The solution also supports CSS stylesheets both linked by the element and embedded in the