Back
May 09, 2018

How to Generate PDF Files in Python with Xhtml2pdf, WeasyPrint or Unoconv

Vladimir Sidorenko
Vladimir Sidorenko

Programmatic generation of PDF files is a frequent task when developing applications that can export reports, bills, or questionnaires. In this article, we will consider three common tools for creating PDFs, including their installation and converting principles.

PDF stands for Portable Document Format and it was originally developed by Adobe, though, it has now become an open standard for text files. Creating a single PDF file from a Microsoft Word document can be easily done through the Word’s menu, print dialogue in Linux or MacOS, or Adobe Acrobat Reader. At the same time, when you need to generate tens, hundreds, or even thousands of PDF files, it is better to automate this task. For generating PDFs with Python, we have chosen the following solutions: Xhtml2pdf, Weasyprint, and Unoconv.

How to Generate PDF in Python with Xhtml2pdf

The main drawback of all HTML to PDF converters is that the latter has numerous aspects that are absent in HTML, including, for example, page size. Xhtml2pdf deals with this problem by adding specific markup tags that allow solving various tasks, such as converting headers and footers on all pages. Xhtml2pdf is a CSS/HTML to PDF generator/converter and Python library that can be used in any Python framework such as Django.

In fact, to create PDFs, the tool uses ReportLab, a common open source Python library for generating PDF files. ReportLab uses an XML-based markup language called Requirements Modelling Language (RML). Therefore, you can think of Xhtml2pdf as another markup language for the ReportLab library.

Installation

You can easily install Xthml2pdf using a common command:

pip install xhtml2pdf

Generating PDFs

Xhtml2pdf usage example:

from xhtml2pdf import pisa
import cStringIO as StringIO
from django.template.loader import get_template
from django.template import Context
def html_to_pdf_directly(request):
	template = get_template("template_name.html")
	context = Context({'pagesize':'A4'})
	html = template.render(context)
	result = StringIO.StringIO()
	pdf = pisa.pisaDocument(StringIO.StringIO(html), dest=result)
	if not pdf.err:
		return HttpResponse(result.getvalue(), content_type='application/pdf')
	else: return HttpResponse('Errors')

You should create a view that would convert a PDF file from HTML. To temporarily store your PDF, use the cStringIO library that will provide an efficient file-like object interface. Set an output HTML file that has to be converted into PDF and the one which will receive the final result. Then you should set the HttpResponse object that has proper headers and enter the command that will get a value from the StringIO buffer and show it as a response.

How to Convert HTML to PDF in Python with WeasyPrint

WeasyPrint is another visual rendering engine that can export HTML/CSS content to PDF. Its main focus is to support web standards for further printing. WeasyPrint is a free tool that is available to download and use under a BSD license. The solution uses different libraries, but it is not based on a particular rendering engine, such as Gecko or Webkit. WeasyPrint’s CSS layout engine is based on Python and it supports 2.7 and 3.3 or higher Python versions. It is worth noting that this engine is created specifically for pagination tasks. In addition, WeasyPrint supports Django.

Features

The tool supports common HTML and CSS files, data URIs, FTP, and HTTP. It means that WeasyPrint supports HTTP redirects, but it does not support more complex features, such as authentication or cookies. The solution also supports CSS stylesheets both linked by the element and embedded in the </i> element. When it comes to images, WeasyPrint supports various elements, such as <i><object></i>, <<i>img></i>, and <i><embed></i>, and image formats, including:</p><ul><li>JPEG,</li><li>GIF,</li><li>SVG,</li><li>and PNG.</li></ul><p>The tool does not, however, rasterize SVG images. Instead, it renders such images as vectors in the output PDF file.</p><p>Furthermore, WeasyPrint supports attachments, bookmarks, and hyperlinks. The tool renders clickable hyperlinks in the event that you are using a convenient PDF viewer. WeasyPrint supports both internal (<i><a href “#name”></i>) anchors and external hyperlinks. The tool usually displays bookmarks in a sidebar and embeds attachments in a converted PDF file.</p><h3>Installation</h3><p>Since WeasyPrint contains many dependencies, you can easily install it by running a <i>pip</i> command when using Python to create a PDF:</p><p><i><code>pip install Weasyprint</code></i></p><p>Then, you should make the tool executable by entering the following code that will show up in your WeasyPrint version:</p><p><i><code>weasyprint --version</code></i></p><h3>Generating PDF</h3><p>To convert HTML/CSS content into a PDF without any changes, you can simply enter the following code:</p><p><i><code>weasyprint http://your_website_address.org ./path_for_storing_your_PDF/file_name.pdf</code></i></p><p>If you need to change the styles of your content in a future PDF file, you have to set your stylesheets:</p><p><i>from weasyprint import HTML, CSS</i></p><p><i>HTML(‘http://your_website_address.org/’).write_pdf(‘/path_for_storing_your_PDF/file_name.pdf’,</i></p><p><i>stylesheets=[CSS(string=’body { font-color: red }’)])</i></p><p>The below-mentioned WeasyPrint usage examples will allow you to convert PDFs, even when some changes are required.</p><p>To add a CSS file, run the following code:</p>

from django.conf import settings

CSS(settings.STATIC_ROOT + ‘css/main.css’)

Ex: HTML(‘http://yourwebsite.org/’).write_pdf(‘/yourdirectory/file.pdf’,
	stylesheets=[CSS(settings.STATIC_ROOT + ‘css/main.css’)])

To convert an existing HTML template into a PDF file, use the following Django functions:

from weasyprint import HTML, CSS
from django.template.loader import get_template
from django.http import HttpResponse

def pdf_generation(request):
	Html_template = get_template(‘template/home_page.html’)
	pdf_file - HTML(string-html_template).write_pdf()
	response = HttpResponse(pdf_file, content_type=’application/pdf’)
	response[‘Content-Disposition’] = ‘filename=”home_page.pdf’’’
	return response

How to use Python to Generate PDFs with Unoconv

Unoconv stands for Universal Office converter. It is a command line solution for converting Libre/OpenOffice files into various formats, including PDF. To convert a file, the tool first reads a necessary file using a listener. If it does not find any available listener, Unoconv can start its own office instance.

Features

Unoconv can convert any file format supported by OpenOffice, and it includes more than 100 file formats.

Supported file formats for exporting to include:

  • doc
  • odt
  • html
  • pdf
  • txt
  • xhtml
  • rtf
  • ooxml, etc.

You also can use this tool for batch processing and apply your own style templates to an existing file you need to convert. If needed, Unoconv can autostart OpenOffice for your processing activities. The solution supports OpenOffice on common desktop operating systems, such as Windows, Linux, and MacOSX. To process your documents centrally, you can use Unoconv in both your client and server environments.

Installation

To install Unoconv on Linux, use the sudo apt-get install command and enter it into your command line:

$ sudo apt-get install -Vy libreoffice unoconv

Python: PDF Creation using Unoconv

$ pip install python-docx
from docx import Document
import subprocess

# edit the Microsoft Word file
document = Document(‘yourfile.docx’)
for paragraph in document.paragraphs:
do_your_stuff()

document.save(‘yourdocument_new.docx’)

try:
 	subprocess.check_call([‘/usr/bin/python3’, ‘/usr/bin/unoconv’, ‘-f’, ‘pdf, ‘-0’, ‘yourdocument_new’])
except subprocess.CalledProcessorError as e:
	print(‘CalledProcessorError’, e)

Unoconv allows users to edit Microsoft Office files before converting. For that purpose, you should set the document you need to edit, enter necessary procedures, and set the command that will initiate saving your document. To convert a Microsoft Word document into PDF in Python, enter the above-mentioned code.

The important differences between these three tools are in their number of supported formats of files and graphics, as well as the preciseness of converting complex content that contains some advanced elements, such as hyperlinks, custom styles, cookies, etc. In fact, a list of available tools that ensures Python PDF generation from other file formats is a bit longer, but we have covered the solutions we have used for our own tasks. One way or another, each of these tools will significantly facilitate your work when you need to create lots of PDF files.

More thoughts

Oct 11, 2010Technology
Testing authentication in Django

In order to check if user is authentcated in test, you can run: from django.contrib.auth import get_user class MyTestCase(TestCase): def test_login(self): self.assertFalse(get_user(self.client).is_authenticated()) self.client.login(username='fred', password='secret') self.assertTrue(get_user(self.client).is_authenticated())

Vladimir Sidorenko
Vladimir Sidorenko
Feb 18, 2010Technology
Absolute urls in models

Everybody knows about permalink, but it's usually used only in get_absolute_url. I prefer to use it for all related model urls.class Event(models.Model):# [email protected] edit_url(self):return ('event_edit', (self.pk, ))And then in template:<a href="{{ event.edit_url }}">Редактировать событие</a>

Vladimir Sidorenko
Vladimir Sidorenko
Dec 01, 2016Technology
How to Use Django & PostgreSQL for Full Text Search

For any project there may be a need to use a database full-text search. We expect high speed and relevant results from this search. When we face such problem, we usually think about Solr, ElasticSearch, Sphinx, AWS CloudSearch, etc. But in this article we will talk about PostgreSQL. Starting from version 8.3, a full-text search support in PostgreSQL is available. Let's look at how it is implemented in the DBMS itself.

Vladimir Kalyuzhny
Vladimir Kalyuzhny
Oct 11, 2010Technology
Char search in Emacs as in Vim

In VIM there is a command for char search: f. After first use it can be repeated with ;. I like to navigate in line with it. You see that you need to go to bracket in a middle of a line - you press f( and one-two ; and you are there. There's no such command in Emacs, so I had to write my own. I've managed even to implement repetition with ;.

Vladimir Sidorenko
Vladimir Sidorenko