Saturday, November 12, 2011

Pandoc - The Markup Converter

If you checked the packages I linked to at my previous post you might have noticed that their descriptions had pretty bad formatting (plain text without oompfh). This has to do with the fact that PyPI doesn't support Markdown syntax. Since I like to write Markdown and rather not convert my files to pure reST I decided to handle the conversion at my setup.py (package metadata).

During my research I came upon Pandoc. There are even Python bindings available for it! Sadly those bindings require exact path to Pandoc utility. This isn't cool since I want the conversion to work anywhere. There might be some nice way to locate it but I didn't bother doing that. Instead I found a nicer solution that works as well.

The Solution

I came by a small snippet using the subprocess module by a chance. Here's my generalized version of that:

Pandoc supports a huge variety of formats (Markdown, reST, textile, HTML, LaTeX, ...). Just plug in the ones you wish to use and go. :)

There's one gotcha to keep in mind, though. My original files used some custom syntax provided by GitHub. Obviously the converter cannot deal with that. In order to solve this I might have to write a little preprocessor of my own. I can live with the current solution for now just fine. Things would have been a lot easier if PyPI supported Markdown. Oh well. :)