Programmatic generation of PDF files is a frequent task when developing applications that can export reports, bills, or questionnaires. In this article, we will consider three common tools for creating PDFs, including their installation and converting principles.
PDF stands for Portable Document Format and it was originally developed by Adobe, though, it has now become an open standard for text files. Creating a single PDF file from a Microsoft Word document can be easily done through the Word’s menu, print dialogue in Linux or MacOS, or Adobe Acrobat Reader. At the same time, when you need to generate tens, hundreds, or even thousands of PDF files, it is better to automate this task. For generating PDFs with Python, we have chosen the following solutions: Xhtml2pdf, Weasyprint, and Unoconv.
The main drawback of all HTML to PDF converters is that the latter has numerous aspects that are absent in HTML, including, for example, page size. Xhtml2pdf deals with this problem by adding specific markup tags that allow solving various tasks, such as converting headers and footers on all pages. Xhtml2pdf is a CSS/HTML to PDF generator/converter and Python library that can be used in any Python framework such as Django.
In fact, to create PDFs, the tool uses ReportLab, a common open source Python library for generating PDF files. ReportLab uses an XML-based markup language called Requirements Modelling Language (RML). Therefore, you can think of Xhtml2pdf as another markup language for the ReportLab library.
You can easily install Xthml2pdf using a common command:
pip install xhtml2pdf
Xhtml2pdf usage example:
from xhtml2pdf import pisa
import cStringIO as StringIO
from django.template.loader import get_template
from django.template import Context
def html_to_pdf_directly(request):
template = get_template("template_name.html")
context = Context({"pagesize": "A4"})
html = template.render(context)
result = StringIO.StringIO()
pdf = pisa.pisaDocument(StringIO.StringIO(html), dest=result)
if not pdf.err:
return HttpResponse(result.getvalue(), content_type="application/pdf")
else:
return HttpResponse("Errors")
You should create a view that would convert a PDF file from HTML. To temporarily store your PDF, use the cStringIO library that will provide an efficient file-like object interface. Set an output HTML file that has to be converted into PDF and the one which will receive the final result. Then you should set the HttpResponse object that has proper headers and enter the command that will get a value from the StringIO buffer and show it as a response.
WeasyPrint is another visual rendering engine that can export HTML/CSS content to PDF. Its main focus is to support web standards for further printing. WeasyPrint is a free tool that is available to download and use under a BSD license. The solution uses different libraries, but it is not based on a particular rendering engine, such as Gecko or Webkit. WeasyPrint’s CSS layout engine is based on Python and it supports 2.7 and 3.3 or higher Python versions. It is worth noting that this engine is created specifically for pagination tasks. In addition, WeasyPrint supports Django.
The tool supports common HTML and CSS files, data URIs, FTP, and HTTP. It means that WeasyPrint supports HTTP redirects, but it does not support more complex features, such as authentication or cookies. The solution also supports CSS stylesheets both linked by the element and embedded in the element. When it comes to images, WeasyPrint supports various elements, such as , <img>, and , and image formats, including:
The tool does not, however, rasterize SVG images. Instead, it renders such images as vectors in the output PDF file.
Furthermore, WeasyPrint supports attachments, bookmarks, and hyperlinks. The tool renders clickable hyperlinks in the event that you are using a convenient PDF viewer. WeasyPrint supports both internal () anchors and external hyperlinks. The tool usually displays bookmarks in a sidebar and embeds attachments in a converted PDF file.
from django.conf import settings
CSS(settings.STATIC_ROOT + "css/main.css")
Ex: HTML("http://yourwebsite.org/").write_pdf("/yourdirectory/file.pdf",
stylesheets=[CSS(settings.STATIC_ROOT + "css/main.css")])
To convert an existing HTML template into a PDF file, use the following Django functions:
from weasyprint import HTML
from django.template.loader import get_template
from django.http import HttpResponse
def pdf_generation(request):
html_template = get_template("template/home_page.html")
pdf_file = HTML(string=html_template).write_pdf()
response = HttpResponse(pdf_file, content_type="application/pdf")
response["Content-Disposition"] = 'filename=”home_page.pdf"'
return response
Unoconv stands for Universal Office converter. It is a command line solution for converting Libre/OpenOffice files into various formats, including PDF. To convert a file, the tool first reads a necessary file using a listener. If it does not find any available listener, Unoconv can start its own office instance.
Unoconv can convert any file format supported by OpenOffice, and it includes more than 100 file formats.
Supported file formats for exporting to include:
You also can use this tool for batch processing and apply your own style templates to an existing file you need to convert. If needed, Unoconv can autostart OpenOffice for your processing activities. The solution supports OpenOffice on common desktop operating systems, such as Windows, Linux, and MacOSX. To process your documents centrally, you can use Unoconv in both your client and server environments.
To install Unoconv on Linux, use the sudo apt-get install
command and enter it into your command line:
$ sudo apt-get install -Vy libreoffice unoconv
$ pip install python-docx
from docx import Document
import subprocess
# edit the Microsoft Word file
document = Document("yourfile.docx")
for paragraph in document.paragraphs:
do_your_stuff()
document.save("yourdocument_new.docx")
try:
subprocess.check_call(["/usr/bin/python3", "/usr/bin/unoconv", "-f", "pdf", "-0", "yourdocument_new"])
except subprocess.CalledProcessorError as e:
print("CalledProcessorError", e)
Unoconv allows users to edit Microsoft Office files before converting. For that purpose, you should set the document you need to edit, enter necessary procedures, and set the command that will initiate saving your document. To convert a Microsoft Word document into PDF in Python, enter the above-mentioned code.
The important differences between these three tools are in their number of supported formats of files and graphics, as well as the preciseness of converting complex content that contains some advanced elements, such as hyperlinks, custom styles, cookies, etc. In fact, a list of available tools that ensures Python PDF generation from other file formats is a bit longer, but we have covered the solutions we have used for our own tasks. One way or another, each of these tools will significantly facilitate your work when you need to create lots of PDF files.
Read Also: SaaS App Development Trends