HTML to PDF transformation with wkhtmltopdf

Introduction

In some Alfresco implementations, you have to generate documents from metadata. You have a lot of options depending of constraints of the projects from text document to PDF. In my case, I had to generate a printable and exportable document from metadata. So, my idea is to generate an HTML file and to use the transformer from HTML to PDF in Alfresco. In my case, I have to create a document sexy and easy to read. It implies the use of CSS files and here is the result:

The result is definetely not acceptable, so I started to search a new transformer. And I discovered the tool wkhtml2pdf. It is a simple shell utility to convert HTML to PDF using the Webkit rendering engine present in QT 4.8. After some tests locally, I decide to include it in Alfresco. Moreover, this tool is available on MacOS, Linux and Windows.

Creation of the transformer

So, I just create a file transformer-context.xml in the folder alfresco/WEB-INF/classes/alfresco/extension. The first thing to do is to create the transformer:

<bean id="transformer.worker.Html2pdf" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
	<property name="mimetypeService">
		<ref bean="mimetypeService" />
	</property>
	<property name="checkCommand">
		<bean class="org.alfresco.util.exec.RuntimeExec">
			<property name="commandMap">
				<map>
					<entry key=".*">
						<value>${wkhtmltopdf.exe} -V</value>
					</entry>
				</map>
			</property>
			<property name="errorCodes">
				<value>1</value>
			</property>
		</bean>
	</property>
	<property name="transformCommand">
		<bean class="org.alfresco.util.exec.RuntimeExec">
			<property name="commandMap">
				<map>
					<entry key=".*">
						<value>${wkhtmltopdf.exe} ${source} ${target}</value>
					</entry>
				</map>
			</property>
			<property name="errorCodes">
				<value>1</value>
			</property>
		</bean>
	</property>
	<property name="explicitTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

<bean id="transformer.html2pdf" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer"> <property name="worker"> <ref bean="transformer.worker.Html2pdf" /> </property> </bean>

 The original transformer used in Alfresco is OpenOffice. So, we need to re-define the relevant bean and add at the end a block unsupportedTransformations. So, this transformer will not be used anymore to transform from HTML to PDF.

<bean id="transformer.JodConverter.Html2Pdf" class="org.alfresco.repo.content.transform.ComplexContentTransformer" parent="baseComplexContentTransformer">
	<property name="transformers">
		<list>
			<ref bean="transformer.JodConverter" />
			<ref bean="transformer.JodConverter" />
		</list>
	</property>
	<property name="intermediateMimetypes">
		<list>
			<value>application/vnd.oasis.opendocument.text</value>
		</list>
	</property>
	<property name="supportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
	<property name="explicitTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
	<property name="unsupportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

And finally, we override the transformer to re-use our new transformer.

<bean id="transformer.JodConverter.2Pdf" class="org.alfresco.repo.content.transform.FailoverContentTransformer" parent="unregisteredBaseContentTransformer">
	<property name="transformers">
		<list>
			<ref bean="transformer.JodConverter" />
			<ref bean="transformer.html2pdf" />
		</list>
	</property>
	<property name="supportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

Conclusion

The result is quite amazing... The SWF preview looks like exactly like the HTML page. So, in our case, we used a Freemarker template to generate HTML content from meta-data and, next, generate the PDF from the HTML content.

More information about wkhtml2pdf:

Show Comments