Pandoc Multiple Files



Pandoc
Original author(s)John MacFarlane
Initial release10 August 2006 (14 years ago)
Stable release
Repository
Written inHaskell
Operating systemUnix-like, Windows
PlatformIA-32, x64
LicenseGNU GPLv2
Websitepandoc.org

Pandoc is a free and open-sourcedocument converter, widely used as a writing tool (especially by scholars)[1] and as a basis for publishing workflows.[2] It was created by John MacFarlane, a philosophy professor at the University of California, Berkeley.[3]

This option can be used repeatedly to include multiple files in the header. Pandoc-latex-template — a clean pandoc LaTeX template to convert your. Create HTML or PDF output files with R, Knitr, MiKTeX and Pandoc. A simple, step-by-step example explaining exactly how to output HTML or PDF files from R.

Functionality[edit]

Pandoc dubs itself a 'markup format' converter. It can take a document in one of the supported formats and convert only its markup to another format. Maintaining the look and feel of the document is not a priority.[4]

Plug-ins for custom formats can also be written in Lua, which has been used to create an exporting tool for the Journal Article Tag Suite, for example.[5]

An included CiteProc option allows Pandoc to use bibliographic data from reference management software in any of four formats: BibTeX, BibLaTeX, CSL JSON or CSL YAML.[6] The information is automatically transformed into a citation in various styles (such as APA, Chicago, or MLA) using an implementation of the Citation Style Language.[6] This allows the program to serve as a simpler alternative to LaTeX for producing academic writing.[7]

Supported file formats[edit]

Pandoc's most thoroughly supported file format is an extended version of Markdown,[8] but it can also read many other forms of:

  • FictionBook (FB2)
  • Jira wiki markup
  • Journal Article Tag Suite (JATS)
  • Markdown: Strict, CommonMark, GitHub Flavored Markdown (GFM), MultiMarkdown (MMD) and Markdown Extra (PHP Extra) variants
  • OpenDocument (ODT)
  • Office Open XML: Microsoft Word variant
  • txt2tags (t2t)
  • Wiki markup: MediaWiki, Muse, TikiWiki, TWiki and Vimwiki variants

It can create files in the following formats, which are not necessarily the same as the input formats:

  • DocBook: Versions 4 and 5
  • EPUB: Versions 2 and 3[9]
  • FictionBook (FB2)
  • HTML: HTML4 and HTML5 variants, respectively compliant with XHTML 1.0 Transitional and XHTML Strict
  • InDesign ICML
  • Jira wiki markup
  • Journal Article Tag Suite (JATS)
  • Markdown: Strict, CommonMark, GitHub Flavored Markdown (GFM), MultiMarkdown (MMD) and Markdown Extra (PHP Extra) variants
  • OpenDocument (ODT/ODF)
  • Office Open XML: Microsoft Word and Microsoft PowerPoint variants
  • PDF (needs a third-party add-on like ConTeXt, pdfroff, wkhtmltopdf, weasyprint or prince)[10]
  • Rich Text Format (RTF)
  • Web-based slideshows: LaTeX Beamer, Slideous, Slidy, DZSlides, reveal.js and S5 variants[11]
  • Wiki markup: DokuWiki, MediaWiki, Muse, TikiWiki, TWiki and Vimwiki variants

See also[edit]

References[edit]

  1. ^Mullen, Lincoln (23 February 2012). 'Pandoc Converts All Your (Text) Documents'. The Chronicle of Higher Education Blogs: ProfHacker. Retrieved 27 June 2014.
    - McDaniel, W. Caleb (28 September 2012). 'Why (and How) I Wrote My Academic Book in Plain Text'. W. Caleb McDaniel at Rice University. Retrieved 27 June 2014.
    - Healy, Kieran (23 January 2014). 'Plain Text, Papers, Pandoc'. Retrieved 27 June 2014.
    - Ovadia, Steven (2014). 'Markdown for Librarians and Academics'. Behavioral & Social Sciences Librarian. 33 (2): 120–124. doi:10.1080/01639269.2014.904696. ISSN0163-9269. S2CID62762368.
  2. ^Till, Kaitlyn; Simas, Shed; Larkai, Velma (14 April 2014). 'The Flying Narwhal: Small mag workflow'. Publishing @ SFU. Retrieved 11 March 2018.
    - Maxwell, John (1 November 2013). 'Building Publishing Workflows with Pandoc and Git'. Publishing @ SFU. Retrieved 27 June 2014.[permanent dead link]
    - Maxwell, John (26 February 2014). 'On Pandoc'. eBound Canada: Digital Production Workshop, Vancouver, BC. Archived from the original on 28 February 2015. Retrieved 27 June 2014.Cite journal requires |journal= (help)
    - Maxwell, John (1 November 2013). 'Building Publishing Workflows with Pandoc and Git'. Publishing @ SFU. Retrieved 12 April 2019.
    - Krewinkel, Albert; Robert Winkler (8 May 2017). 'Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar'. PeerJ Computer Science. 3: e112. doi:10.7717/peerj-cs.112. Retrieved 25 May 2017.
  3. ^'John MacFarlane'. Department of Philosophy. University of California, Berkeley. Retrieved 25 July 2014.
  4. ^'Pandoc User's Guide'. pandoc.org. Description. Retrieved 22 January 2019. ...one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details...
  5. ^Fenner, Martin (12 December 2013). 'From Markdown to JATS XML in one Step'. Gobbledygook. Retrieved 27 June 2014.
  6. ^ ab'Citations'. Pandoc User's Guide. Retrieved 2021-04-08.
  7. ^Tenen, Dennis; Grant Wythoff (19 March 2014). 'Sustainable Authorship in Plain Text using Pandoc and Markdown'. The Programming Historian. Retrieved 27 June 2014.
  8. ^'Pandoc's Markdown'. Pandoc User's Guide. Retrieved 2019-08-01.
  9. ^Mullen, Lincoln (20 March 2012). 'Make Your Own E-Books with Pandoc'. The Chronicle of Higher Education Blogs: ProfHacker. Retrieved 27 June 2014.
  10. ^'Getting started with pandoc'. pandoc.org. Creating a PDF. Retrieved 22 January 2019.
  11. ^See as an example MacFarlane, John (17 May 2014). 'Pandoc for Haskell Hackers'. BayHac 2014, Mountain View, CA. Retrieved 27 June 2014.Cite journal requires |journal= (help)CS1 maint: location (link) The source file is written in Markdown.

External links[edit]

Wikiversity has learning resources about PanDocElectron
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Pandoc&oldid=1018572588'

Either you've already heard of pandoc or if you have searched online for markdown to pdf or similar, you are sure to come across pandoc. This tutorial will help you use pandoc to generate pdf and epub from a GitHub style markdown file. The main motivation for this blog post is to highlight what customizations I did to generate pdf and epub versions for self-publishing my ebooks. It wasn't easy to arrive at the set-up I ended up with, so I hope this will be useful for those looking to use pandoc to generate pdf and epub formats. This guide is specifically aimed at technical books that has code snippets.


Installation🔗

If you use a debian based distro like Ubuntu, the below steps are enough for the demos in this tutorial. If you get an error or warning, search that issue online and you'll likely find what else has to be installed.

I first downloaded deb file from pandoc: releases and installed it. Followed by packages needed for pdf generation.

For more details and guide for other OS, refer to pandoc: installation


Minimal example🔗

Once pandoc is working on your system, try generating a sample pdf without any customization.

See learnbyexample.github.io repo for all the input and output files referred in this tutorial.

Here sample_1.md is input markdown file and -f is used to specify that the input format is GitHub style markdown. The -o option specifies the output file type based on extension. The default output is probably good enough. But I wished to customize hyperlinks, inline code style, add page breaks between chapters, etc. This blog post will discuss these customizations one by one.

Pandoc Convert Multiple Files

pandoc has its own flavor of markdown with many useful extensions — see pandoc: pandocs-markdown for details. GitHub style markdown is recommended if you wish to use the same source (or with minor changes) in multiple places.

It is advised to use markdown headers in order without skipping — for example, H1 for chapter heading and H2 for chapter sub-section, etc is fine. H1 for chapter heading and H3 for sub-section is not. Using the former can give automatic index navigation on ebook readers.

On Evince reader, the index navigation for above sample looks like this:


Chapter breaks🔗

As observed from previous demo, by default there are no chapter breaks. Searching for a solution online, I got this piece of tex code:

This can be added using -H option. From pandoc manual,

-H FILE, --include-in-header=FILE

Include contents of FILE, verbatim, at the end of the header. Thiscan be used, for example, to include special CSS or JavaScript inHTML documents. This option can be used repeatedly to include multiplefiles in the header. They will be included in the order specified.Implies --standalone.

The pandoc invocation now looks like:

You can add further customization to headings, for example use sectionfont{underlineclearpage} to underline chapter names or sectionfont{LARGEclearpage} to allow chapter names to get even bigger. Here's some more links to read about various customizations:


Changing settings via -V option🔗

-V KEY[=VAL], --variable=KEY[:VAL]

Set the template variable KEY to the value VAL when rendering thedocument in standalone mode. This is generally only useful when the--template option is used to specify a custom template, since pandocautomatically sets the variables used in the default templates. Ifno VAL is specified, the key will be given the value true.

The -V option allows to change variable values to customize settings like page size, font, link color, etc. As more settings are changed, better to use a simple script to call pandoc instead of typing the whole command on terminal.

  • mainfont is for normal text
  • monofont is for code snippets
  • geometry for page size and margins
  • linkcolor to set hyperlink color
  • to increase default font size, use -V fontsize=12pt
    • See stackoverflow: change font size if you need even bigger size options

Using xelatex as the pdf-engine allows to use any font installed in the system. One reason I chose DejaVu was because it supported Greek and other Unicode characters that were causing error with other fonts. See tex.stackexchange: Using XeLaTeX instead of pdfLaTeX for some more details.

The pandoc invocation is now through a script:

Do compare the pdf generated side by side with previous output before proceeding.

On my system, DejaVu Serif did not have italic variation installed, so I had to use sudo apt install ttf-dejavu-extra to get it.

Pandoc Multiple Files Download


Syntax highlighting🔗

One option to customize syntax highlighting for code snippets is to save one of the pandoc themes and editing it. See stackoverflow: What are the available syntax highlighters? for available themes and more details (as a good practice on stackoverflow, go through all answers and comments — the linked/related sections on sidebar are useful as well).

Edit the above file to customize the theme. Use sites like colorhexa to help with color choices, hex values, etc. For this demo, the below settings are changed:

Inline code

Similar to changing background color for code snippets, I found a solution online to change background color for inline code snippets.

Add --highlight-style pygments.theme and --include-in-header inline_code.tex to the script and generate the pdf again.

With pandoc sample_2.md -f gfm -o sample_2.pdf the output would be:

With ./md2pdf_syn.sh sample_2.md sample_2_syn.pdf the output is:


For my Python re(gex)? book, by chance I found that using ruby instead of python for REPL code snippets syntax highlighting was better. Snapshot from ./md2pdf_syn.sh sample_3.md sample_3.pdf result is shown below. For python directive, string output gets treated as a comment and color for boolean values isn't easy to distinguish from string values. The ruby directive treats string value as expected and boolean values are easier to spot.


Bullet styling🔗

This stackoverflow Q&A helped for bullet styling.

Comparing pandoc sample_4.md -f gfm -o sample_4.pdf vs ./md2pdf_syn_bullet.sh sample_4.md sample_4_bullet.pdf gives:


PDF properties🔗

This tex.stackexchange Q&A helped to change metadata. See also pspdfkit: What’s Hiding in Your PDF? and discussion on HN.

./md2pdf_syn_bullet_prop.sh sample_4.md sample_4_bullet_prop.pdf gives:


Adding table of contents🔗

Pandoc Convert Multiple Files

There's a handy option --toc to automatically include table of contents at top of the generated pdf. You can control number of levels using --toc-depth option, the default is 3 levels. You can also change the default string Contents to something else using -V toc-title option.

Pandoc multiple output files

./md2pdf_syn_bullet_prop_toc.sh sample_1.md sample_1_toc.pdf gives:


Adding cover image🔗

To add something prior to table of contents, cover image for example, you can use a tex file and include it verbatim. Create a tex file (named as cover.tex here) with content as shown below:

Then, modify the previous script md2pdf_syn_bullet_prop_toc.sh by adding --include-before-body cover.tex and tada — you get the cover image before table of contents. thispagestyle{empty} helps to avoid page number on the cover page, see also tex.stackexchange: clear page.

The bash script invocation is now ./md2pdf_syn_bullet_prop_toc_cover.sh sample_5.md sample_5.pdf.

You'll need at least one image in input markdown file, otherwise settings won't apply to the cover image and you may end up with weird output. sample_5.md used in the command above includes an image. And be careful to use escapes if the image path can contain tex metacharacters.


Stylish blockquote🔗

By default, blockquotes (lines starting with > in markdown) are just indented in the pdf output. To make them standout, tex.stackexchange: change the background color and border of blockquote helped.

Create quote.tex with the contents as shown below. You can change the colors to suit your own preferred style.

The bash script invocation is now ./md2pdf_syn_bullet_prop_toc_cover_quote.sh sample_5.md sample_5_quote.pdf. The difference between default and styled blockquote is shown below.


Customizing epub🔗

For a long time, I thought epub didn't make sense for programming books. Turned out, I wasn't using the right ebook readers. FBReader is good for novels but not ebooks with code snippets. When I used atril and calibre ebook-viewer, the results were good.

I didn't know how to use css before trying to generate the epub version. Somehow, I managed to take the default epub.css provided by pandoc and customize it as close as possible to the pdf version. The modified epub.css is available from the learnbyexample.github.io repo. The bash script to generate the epub is shown below and invoked as ./md2epub.sh sample_5.md sample_5.epub. Note that pygments.theme is same as the pdf customization discussed before.


Resource links🔗

Pandoc Multiple Files

More options and workflows for generating ebooks:

  • pandoc-latex-template — a clean pandoc LaTeX template to convert your markdown files to PDF or LaTeX
  • Jupyter Book — open source project for building beautiful, publication-quality books and documents from computational material
    • See also fastdoc — the output of fastdoc is an asciidoc file for each input notebook. You can then use asciidoctor to convert that to HTML, DocBook, epub, mobi, and so forth
  • Asciidoctor
  • Sphinx

Miscellaneous

  • picular: search engine for colors and colorhexa