Back
May 9, 2018

How to Generate PDF Files in Python with Xhtml2pdf, WeasyPrint or Unoconv

Programmatic generation of PDF files is a frequent task when developing applications that can export reports, bills, or questionnaires. In this article, we will consider three common tools for creating PDFs, including their installation and converting principles.

PDF stands for Portable Document Format and it was originally developed by Adobe, though, it has now become an open standard for text files. Creating a single PDF file from a Microsoft Word document can be easily done through the Word’s menu, print dialogue in Linux or MacOS, or Adobe Acrobat Reader. At the same time, when you need to generate tens, hundreds, or even thousands of PDF files, it is better to automate this task. For generating PDFs with Python, we have chosen the following solutions: Xhtml2pdf, Weasyprint, and Unoconv.

How to Generate PDF in Python with Xhtml2pdf

The main drawback of all HTML to PDF converters is that the latter has numerous aspects that are absent in HTML, including, for example, page size. Xhtml2pdf deals with this problem by adding specific markup tags that allow solving various tasks, such as converting headers and footers on all pages. Xhtml2pdf is a CSS/HTML to PDF generator/converter and Python library that can be used in any Python framework such as Django.

In fact, to create PDFs, the tool uses ReportLab, a common open source Python library for generating PDF files. ReportLab uses an XML-based markup language called Requirements Modelling Language (RML). Therefore, you can think of Xhtml2pdf as another markup language for the ReportLab library.

Installation

You can easily install Xthml2pdf using a common command:

pip install xhtml2pdf

Generating PDFs

Xhtml2pdf usage example:

from xhtml2pdf import pisa
import cStringIO as StringIO

from django.template.loader import get_template
from django.template import Context


def html_to_pdf_directly(request):
	template = get_template("template_name.html")
	context = Context({"pagesize": "A4"})
	html = template.render(context)
	result = StringIO.StringIO()
	pdf = pisa.pisaDocument(StringIO.StringIO(html), dest=result)
	
	if not pdf.err:
		return HttpResponse(result.getvalue(), content_type="application/pdf")
	else:
		return HttpResponse("Errors")

You should create a view that would convert a PDF file from HTML. To temporarily store your PDF, use the cStringIO library that will provide an efficient file-like object interface. Set an output HTML file that has to be converted into PDF and the one which will receive the final result. Then you should set the HttpResponse object that has proper headers and enter the command that will get a value from the StringIO buffer and show it as a response.

How to Convert HTML to PDF in Python with WeasyPrint

WeasyPrint is another visual rendering engine that can export HTML/CSS content to PDF. Its main focus is to support web standards for further printing. WeasyPrint is a free tool that is available to download and use under a BSD license. The solution uses different libraries, but it is not based on a particular rendering engine, such as Gecko or Webkit. WeasyPrint’s CSS layout engine is based on Python and it supports 2.7 and 3.3 or higher Python versions. It is worth noting that this engine is created specifically for pagination tasks. In addition, WeasyPrint supports Django.

Features

The tool supports common HTML and CSS files, data URIs, FTP, and HTTP. It means that WeasyPrint supports HTTP redirects, but it does not support more complex features, such as authentication or cookies. The solution also supports CSS stylesheets both linked by the <link rel-stylesheet> element and embedded in the <style> element. When it comes to images, WeasyPrint supports various elements, such as <object>, <img>, and <embed>, and image formats, including:

  • JPEG,
  • GIF,
  • SVG,
  • and PNG.

The tool does not, however, rasterize SVG images. Instead, it renders such images as vectors in the output PDF file.

Furthermore, WeasyPrint supports attachments, bookmarks, and hyperlinks. The tool renders clickable hyperlinks in the event that you are using a convenient PDF viewer. WeasyPrint supports both internal (<a href “#name”>) anchors and external hyperlinks. The tool usually displays bookmarks in a sidebar and embeds attachments in a converted PDF file.

from django.conf import settings

CSS(settings.STATIC_ROOT + "css/main.css")

Ex: HTML("http://yourwebsite.org/").write_pdf("/yourdirectory/file.pdf",
	stylesheets=[CSS(settings.STATIC_ROOT + "css/main.css")])

To convert an existing HTML template into a PDF file, use the following Django functions:

from weasyprint import HTML

from django.template.loader import get_template
from django.http import HttpResponse


def pdf_generation(request):
	html_template = get_template("template/home_page.html")
	pdf_file = HTML(string=html_template).write_pdf()
	response = HttpResponse(pdf_file, content_type="application/pdf")
	response["Content-Disposition"] = 'filename=”home_page.pdf"'
	return response

How to use Python to Generate PDFs with Unoconv

Unoconv stands for Universal Office converter. It is a command line solution for converting Libre/OpenOffice files into various formats, including PDF. To convert a file, the tool first reads a necessary file using a listener. If it does not find any available listener, Unoconv can start its own office instance.

Features

Unoconv can convert any file format supported by OpenOffice, and it includes more than 100 file formats.

Supported file formats for exporting to include:

  • doc
  • odt
  • html
  • pdf
  • txt
  • xhtml
  • rtf
  • ooxml, etc.

You also can use this tool for batch processing and apply your own style templates to an existing file you need to convert. If needed, Unoconv can autostart OpenOffice for your processing activities. The solution supports OpenOffice on common desktop operating systems, such as Windows, Linux, and MacOSX. To process your documents centrally, you can use Unoconv in both your client and server environments.

Installation

To install Unoconv on Linux, use the sudo apt-get install command and enter it into your command line:

$ sudo apt-get install -Vy libreoffice unoconv

Python: PDF Creation using Unoconv

$ pip install python-docx

from docx import Document
import subprocess


# edit the Microsoft Word file
document = Document("yourfile.docx")
for paragraph in document.paragraphs:
	do_your_stuff()


document.save("yourdocument_new.docx")


try:
 	subprocess.check_call(["/usr/bin/python3", "/usr/bin/unoconv", "-f", "pdf", "-0", "yourdocument_new"])
except subprocess.CalledProcessorError as e:
	print("CalledProcessorError", e)

Unoconv allows users to edit Microsoft Office files before converting. For that purpose, you should set the document you need to edit, enter necessary procedures, and set the command that will initiate saving your document. To convert a Microsoft Word document into PDF in Python, enter the above-mentioned code.

The important differences between these three tools are in their number of supported formats of files and graphics, as well as the preciseness of converting complex content that contains some advanced elements, such as hyperlinks, custom styles, cookies, etc. In fact, a list of available tools that ensures Python PDF generation from other file formats is a bit longer, but we have covered the solutions we have used for our own tasks. One way or another, each of these tools will significantly facilitate your work when you need to create lots of PDF files.

Read Also: SaaS App Development Trends

Subscribe for the news and updates

More thoughts
Apr 15, 2024Technology
Lazy Promises in Node.js

Promise is a powerful tool in asynchronous programming that allows developers to call a time-consuming function and proceed with program execution without waiting for the function result.

Apr 19, 2022Technology
Improve efficiency of your SELECT queries

SQL is a fairly complicated language with a steep learning curve. For a large number of people who make use of SQL, learning to apply it efficiently takes lots of trials and errors. Here are some tips on how you can make your SELECT queries better. The majority of tips should be applicable to any relational database management system, but the terminology and exact namings will be taken from PostgreSQL.

Mar 2, 2017Technology
API versioning with django rest framework?

We often handling API server updates including backwards-incompatible changes when upgrading web applications. At the same time we update the client part, therefore, we did not experience any particular difficulties.

Jan 9, 2017Technology
How to Use GraphQL with Django

GraphQL is a very powerful library, which is not difficult to understand. GraphQL will help to write simple and clear REST API to suit every taste and meet any requirements.

Dec 11, 2016Technology
Auto WebSocket Reconnection with RxJS (with Example)

In this RxJS tutorial article, we will focus on restoring the websocket connection when using RxJS library.

Nov 21, 2016Technology
Crawling FTP server with Scrapy

Welcome all who are reading this article. I was given a task of creating a parser (spider) with the Scrapy library and parsing FTP server with data. The parser had to find lists of files on the server and handle each file separately depending on the requirement to the parser.