© Thomas Mailund 2019
T. MailundIntroducing Markdown and Pandochttps://doi.org/10.1007/978-1-4842-5149-2_5

5. Translating Documents

Thomas Mailund1 
(1)
Aarhus N, Denmark
 

Once we have a document written in Markdown, we want to translate it into other file formats using Pandoc. First, though, we have to download Pandoc. Go to Pandoc’s installation guide at http://pandoc.org/installing.html and follow the instructions relevant for your platform.

Formatting a Markdown Document with Pandoc

For our first example , we can take the small Markdown document shown here:
# This is a test document
Here is some text in the document.
* This is a list
* With two items
If we save this Markdown document in a file called input.md, we can translate it into an HTML file, output.html, using the command:
pandoc -o output.html input.md

The -o option specifies the output file. The input.md file is specified without any options. You do not need any options for input files, and you can provide more than one. If you provide more than one input file, they are in effect concatenated before Pandoc processes them, so if you want to construct a book from several chapters you have written in separate files, you can provide them on the command line in the order you want the chapters to appear in the book.

Pandoc figures out the input and output format from the file extensions, so if you use the preceding command, it will know that the input is Markdown (filename suffix .md) and that the output should be HTML (filename suffix .html). You can make the format of input and output formats explicit. You can use the option --from to specify the input format and --to to specify the output format. In most cases, you will not need to specify the formats—the filenames contain all the information you need—but sometimes different formats share the same filename suffixes, such as the EPUB and EPUB3 formats that both use filename suffix .epub. In those cases, you need the options.

If you specify the input and output document format, then you can also treat pandoc as a program you can pipe input into and get the formatted document out from. You could, for example, write
cat input.md |
    pandoc --from markdown --to html
    > output.html

In itself there is little use for this, but combined with preprocessing (Chapter 10) and filters (Chapter 11), it is very handy.

Back to the output of pandoc. If you run the previous command
pandoc -o output.html input.md
in your terminal, then the output.html file should now contain the following HTML:
<h1 id="this-is-a-test-document.">
    This is a test document
</h1>
<p>Here is some text in the document.</p>
<ul>
<li>This is a list</li>
<li>With two items</li>
</ul>

If you are not familiar with HTML, this might not be readable, but I hope that you can at least recognize the elements from the input Markdown.

This HTML is not a complete HTML file. It is a fragment of an HTML file that corresponds to the Markdown document, but it is missing header and footer markup that is needed for a complete HTML page. Per default, Pandoc creates HTML markup that can be added to a web page, but not standalone documents. To get the header and footer added as well, you can use the option --standalone.
pandoc --standalone -o output.html input.md

The --standalone option is needed for HTML output if you want a complete document. If you choose an output format that is typically not meaningful as a fragment, such as PDF documents (suffix .pdf), EPUB documents (suffix .epub or .epub3, or Word files (suffix .docx), Pandoc will automatically create complete documents, and the --standalone option is not needed.

If you run pandoc --standalone -o output.html input.md, you will get a warning:
[WARNING] This document format requires a nonempty
  <title> element.
  Please specify either 'title' or 'pagetitle' in
  the metadata,
  e.g. by using --metadata pagetitle="..." on the
  command line.
  Falling back to 'input'
But despite the warning, you will get an HTML document that contains all the elements such a document needs:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
      lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport"
        content="width=device-width,
                  initial-scale=1.0,
                  user-scalable=yes" />
  <title>input</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block;
                vertical-align: top; width: 50%;}
  </style>
</head>
<body>
<h1 id="this-is-a-test-document">
    This is a test document
</h1>
<p>Here is some text in the document.</p>
<ul>
<li>This is a list</li>
<li>With two items</li>
</ul>
</body>
</html>

There is more general HTML here than there is in our text, but luckily we do not need to worry about it; we can focus on the Markdown and let Pandoc worry about the rest.

The warning we got has to do with the title we get in the line
<title>input</title>

Pandoc just took the name from the input file but was hoping to get the title explicitly provided. You can do this using metadata; see Chapter 8. Ignore it for now.

You can try making different output formats with the commands
pandoc -o output.pdf input.md
pandoc -o output.docx input.md
pandoc -o output.epub input.md
pandoc --to=epub3 -o output.epub input.md

In the last example, we explicitly specify that the EPUB format to use in the output is EPUB3.

Another case is where we might want to specify the --to format is for PDFs. By default, Pandoc will create PDF files using LaTeX, but you can specify that it should use ConTeXt instead with the command:
pandoc --to=context -o output.pdf input.md
To get a complete list of supported input and output formats, run the commands
pandoc --list-input-formats
and
pandoc --list-output-formats

respectively.

Pandoc can translate between many different input and output formats, but in this book, we will only consider Markdown input and how to translate Markdown to other formats.

Frequently Useful Options

There are many options you can use to influence how Pandoc transform a document. I refer you to the online manual1 for a full list. Here, I will list a few that I find particularly useful in my own writing.

Sections and Chapters

First, we consider options that relate to how sections are interpreted. In Markdown we specify the different levels of section headers by the number of hashtags, but when we produce a document, we sometimes have to worry about whether the top-level sections are parts, chapters, or sections. If we are writing a short report or paper, we want the level-one headers to start sections, but if we are writing a book, we want them to start chapters. The default depends on the output we produce, and Pandoc does some guessing for us, but you can choose explicitly what the top level should be using the --top-level-division option . For my books, where I want the level-one headers to be chapter headers, I use --top-level-division=chapter.

Table of Contents

To add a table of contents to your output document, you can use the option --toc or the option --table-of-contents; the first is just a shorter version of the second. You can specify the level of sections you want in the table of contents using the --toc-depth option. When I produce ebooks, I typically only want the table of contents for the chapter level, so I use --toc-depth=1. When I produce PDF, I am happy with the default. You can play around with the option to see what you prefer.

Image Extensions

If you want to produce both PDFs and ebooks from your Markdown input, you might want to use PDF vector graphics for figures for the PDF output but bitmap PNG for the ebook version. Using bitmap graphics for the PDF output means you have to worry about the resolution, but using PDF graphics for ebooks doesn’t always work. When you insert images into your document using the
![Figure caption](graphics-file)

syntax, you need to specify the input file name, but if you want to produce both ebooks and PDFs, you don’t want to have to change all the file names depending on which output format you are producing.

You can leave out the filename suffix for graphics files and specify the desired suffix using the
--default-image-extension
option instead. Then, for any graphics file where you haven’t explicitly written the filename suffix, Pandoc will use the default. I always use
--default-image-extension=pdf
when producing PDF documents and
--default-image-extension=png

when producing ebooks .

Ebook Covers

Ebooks contain cover images together with their text. If you produce ebooks, you want to specify the cover image as well. You can do this using the --epub-cover-image option. If your cover image is in the file cover.png, you write
--epub-cover-image=cover.png

Using Makefiles

For this book, I have all my text in a single book.md document. This is fine for a short book like this one, but usually, I keep each chapter in a separate file. The command line for compiling my books can get rather long, and I often have various options for different output formats that I need to remember, so if I had to use the command line each time I wanted to build a new version of a book, it would quickly become tedious and extremely error-prone. So I use Make ( https://tinyurl.com/nyc2ec2 ) for compiling my books.

If you are not familiar with Make, I will give you sufficient detail for to read the Makefile I give as an example, and maybe a starting point for your own Pandoc Makefile, but introducing Make in full detail is beyond the scope of this book. Many programs solve the same problem that Make does, so there are alternatives to choose from if you do not like Make.

The Makefile I use is not sophisticated and it suffices to know this:
  1. 1.

    You define a variable by writing

     
VARIABLE_NAME := VALUES
  1. 2.

    You refer to the value that a variable holds using

     
$(VARIABLE_NAME)
  1. 3.

    When you write target: dependencies you say that of you want to have target and then, if any of the dependencies have changed since lasts time you constructed target then you need to construct it again. If dependencies of dependencies have changed, then you need to build the dependencies and then the target. And so on.

     
  2. 4.

    The lines after target: dependencies that are indented by a tab are the instructions your computer needs to make target.

     
  3. 5.

    The first target: dependencies line in the Makefile is the target that Make will handle if you do not provide another target on the command line.

     
The Makefile I use for this book looks roughly like this (although I have left out a few options). I will walk you through it here.
CHAPTERS := header.yml book.md
PANDOC := pandoc
OPTS_ALL := --toc --smart
            --top-level-division=chapter
PDF_OPTS := $(OPTS_ALL)
            --default-image-extension=pdf
EPUB_OPTS := $(OPTS_ALL)
            --default-image-extension=png
            -t epub3 --toc-depth=1
            --epub-cover-image=cover.png
all: book.pdf book.epub book.docx
book.pdf: $(CHAPTERS) Makefile
    $(PANDOC) $(PDF_OPTS) -o $@ $(CHAPTERS)
book.epub: $(CHAPTERS) Makefile
    $(PANDOC) $(EPUB_OPTS) -o $@ $(CHAPTERS)
book.docx: $(CHAPTERS) Makefile
    $(PANDOC) $(PDF_OPTS) -o $@ $(CHAPTERS)
clean:
    rm book.pdf book.epub book.docx

I use a variable to hold the input files. Just the header and the single Markdown file in this case. You can specify metadata (see Chapter 8) on the Pandoc command line or in a YML file. You can put the metadata in your Markdown text, but then it has to be at the top of the input, and you cannot translate a single chapter without including the first one. I prefer to put my metadata in a separate file, which in this case is header.yml.

I only have one Markdown file for this book. I am writing this in the Ulysses editor where I can split the book into different sections but export it as one combined file. Since the partition into chapters and sections is kept in Ulysses and not in separate files, I do not have a file per chapter.

I put the input files in the variable CHAPTERS. If you have several, you can add it to the CHAPTERS variable. I keep the Pandoc command tool in a variable as well. I have more than one version installed, and I can switch between them by updating the variable. After that, I put the arguments that all Pandoc runs share in OPTS_ALL and then PDF and EPUB-specific options in PDF_OPTS and EPUB_OPTS. I make a Word document, but I can use the PDF arguments for this; Pandoc will know how to make a Word document with the same options as used for the PDF. Next is all the targets, dependencies, and commands for making the targets. The targets all and clean are special. The first doesn’t build anything; it just triggers a build of its dependencies—so it is indirectly building these—which are the three book formats the Makefile knows how to make. The clean target doesn’t have any dependencies, but it will delete the books we generate with the middle three commands.

I then define some options I want to use for all output formats and then options that I only want to use for PDF and others I only want to use for ebooks. In the metadata I set for PDF output, I could also have put in the header—they wouldn’t interfere with the EPUB output if I did—but I have chosen to set them here.

I build an EPUB book in the Makefile, but if you are planning to publish on iBooks, I do not recommend using this file. I find it much easier to format and submit a book using Pages. Pages can read the Word document, and that is how I usually submit a book to iBooks: I make a Word document, open it in Pages, and then submit it. You might think that building a book in the right format and then use iTunes Connect to submit it would be easier. You would be wrong. If you want to publish on Amazon (on Kindle Direct Publishing), you are also better of using their tool Kindle Create. Kindle Create can read the Word file, and you can submit from there. There is a command line tool that can translate from EPUB to the MOBI file format used on Kindle, but Kindle Create is easier to use.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset