MediaWiki and XSL-FO

At work, we’re using MediaWiki as both an internal and public wiki (we have two different ones, separated, to provide watertight bulkheads between them).

Recently, the administrator of the public wiki has been looking for ways to automatically generate PDF files. One extension in particular, Extension:Pdf Export, seemed to be useful; but on closer inspection we found out that the component it relied on, htmldoc, could only handle HTML 3.2.

Thus we were stranded. We tried PHP dompdf for a while, but it threw a fatal exception on the code output by MediaWiki, so that was a no-choice either.

But it seems like MediaWiki always generates XHTML-compliant output, which means that it’s possible to use a XSLT/XSL-FO parser. And Apache FOP seems to be a good choice right now; it’s Java-based, meaning that we can run it on the unix box with no problems (we hope!).

So, essentially, the way this could work would be to take the output from the PHP code described in Extension:Pdf Export above, but instead of running it through htmldoc, we run it through fop, kind of like this:

fop -xml generated.xhtml -xsl mediawiki-to-fo.xsl -pdf output.pdf

“generated.xhtml” is the file saved from the MediaWiki plugin, mediawiki-to-fo.xsl is a stylesheet that converts HTML into suitable XSL-FO definitions, and output.pdf is the generated result. FOP turns out to be quick and expedient.

Of course, this leaves us with generating the .xsl file, which is going to take some time. An excellent start is available at IBM DeveloperWorks.

All that remains now is putting the pieces together and we should have a simple, efficient plugin that generates beautiful PDF documents. If everything works as expected, that is … ;)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>