© Thomas Mailund 2019
T. MailundIntroducing Markdown and Pandochttps://doi.org/10.1007/978-1-4842-5149-2_8

8. Metadata

Thomas Mailund1 
(1)
Aarhus N, Denmark
 
If you go back and look at the standalone HTML document we looked at in Chapter 5, the document was this:
# This is a test document
Here is some text in the document.
  * This is a list
  * With two items
We compiled it like this
pandoc --standalone -o output.html input.md
You should get the warning
[WARNING] This document format requires a nonempty
<title> element.
Please specify either 'title' or 'pagetitle' in
the metadata,
e.g. by using --metadata pagetitle="..." on the
command line.
Falling back to 'input'
and the title in your document will then, just as the warning says, look like this:
<title>input</title>
Pandoc inserted a title to your document, but it is set to a default value, input, because we didn’t specify it. Try running this command instead:
pandoc --metadata title="My Title"
       --standalone -o output.html input.md

If you now read the output.html file, you will see that Pandoc has inserted “My Title” between the title tags and inserted a level-one header that says “My Title”.

When Pandoc generates a standalone document, it uses metadata such as title and author(s) to fill in some information. This data is usually not specified in the Markdown input—there aren’t any Markdown annotations for defining such metadata—but you can set it using the --metadata option or using YAML (see the following text).

Strictly speaking, there are two types of variables that are used when producing the output: metadata, specified with --metadata, and variables, specified with --variable. The difference between them is that metadata can be seen and processed by Pandoc and Pandoc filters—scripts that process your input before it is formatted for the output—while variables are used in templates. If you set a variable using the --metadata tag , or in a metadata header, the variable will also be available to templates, so you can usually stick to metadata. The output isn’t exactly the same since filters might do something with metadata that they won’t do with variables, but it is easier to stick with one kind of options. So unless you have good reasons not to, use metadata.

YAML for metadata

There are potentially many values you want to specify as metadata, so you don’t want to rely on command line options for all of those. Luckily, Pandoc can read metadata from a header in your input, specified in another markup language called YAML (Yet Another Markup Language). YAML is a different kind of markup language than Markdown. It is not intended for marking up a text but for providing structured data to tools.

You can put a YAML header with metadata at the top of your input text to provide Pandoc with theinformation. I usually put my metadata in a separate file instead and give that as the first input file when I run Pandoc. Since Pandoc concatenates the input files you give it, this is equivalent to putting the metadata at the top of the document, but it does give me the option of using different headers when I produce output in different formats and I can easily format different Markdown files with the same metadata.

A YAML header starts with three hyphens --- on a line of their own and is terminated with another three hyphens. Inside the header, you can put key-value information. The keys are followed by a colon, and the values follow the colon. The header I use for this book looks like this:
---
title:
  "My Markdown and Pandoc book"
author:
  - Thomas Mailund
year: 2019
---

It sets three values, the title, the author, and the year I am writing the book, which is all that I need for this book. I didn’t need to put the title in quotes. I could write it as it is, the same way I write my name in the author’s field. However, if a title, or any value in general, contains a colon, you do need to put the value in quotes. Here, I use the quotes to show that as an example.

You will notice that for the author: field I have a hyphen before my name. I didn’t have to put that there either, but I did to show you a list. When you want a key to refer to a sequence of values, for example, if you have more than one author on a document, you use hyphens before each element in the list. Here, I make author refer to a list of length one. The result is the same as if I hadn’t put my name in a list, but if I had a coauthor, we would need the list syntax.

I have this header in a file called header.yml, and I can compile the book into a PDF file with the command
pandoc -o book.pdf header.yml book.md

The actual command line is more complex (see the Makefile in Chapter 5), but this command would suffice to generate a book.

The YAML language is essentially a way of mapping keys to values. In the preceding header, you have three keys: title, author, and year. A key can map to a single scalar value. title and year do that. They can also map to a list, as author does. Keys can also map to nested key-value mappings. Consider this:
author:
    - name: Thomas Mailund
      affiliation: Unseen University
    - name: Karsken Baelg
      affiliation: Brakebills University
Here author is a list—you can see this because you have dashes before each author. Each author is a nested mapping; they have two keys, name and affiliation. That it is another mapping is because they have keys followed by a colon and then values to the right of this. In short, scalars are keys followed by a single value, lists are keys followed by a sequence of items separated by hyphens, and nested maps are nested key-value maps. For lists and maps, there is a more concise notation. For a list you can put its values in square brackets and comma-separate them:
author_names: ["Thomas Mailund", "Karsken Baelg"]
For maps you can use curly brackets instead
author:
    - { name: "Thomas Mailund",
        affiliation: "Unseen University" }
    - { name: "Karsken Baelg",
        affiliation: "Brakebills University" }

Here, the items in the list have the same structure. They need not have this; it is easier to write code to process it when they do, though.

If you have a long text, you can break it into several lines in the YAML file using either | or >
abstract: |
    This is a very long abstract and
    it is probably the best paper ever.
    We are sure that you will all agree.

The difference between the two is that | will preserve linebreaks, while > will remove new lines and replace them with space. You can continue the text in these blocks as long as you want to, as long as you indent each line.

You can write arbitrarily complex YAML, but Pandoc will only use the metadata that it knows how to process. Which variables are interpreted by Pandoc depends on the output format and the template you that use (see Chapter 9). Check the Pandoc manual at https://tinyurl.com/yyxgole5 for details on the default templates, and see Chapter 9 for how to use metadata in your own templates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset