© Thomas Mailund 2019
T. MailundIntroducing Markdown and Pandochttps://doi.org/10.1007/978-1-4842-5149-2_2

2. Why Use Markdown and Pandoc

Thomas Mailund1 
(1)
Aarhus N, Denmark
 

If you are used to WYSIWYG editors such as Microsoft Word, you might reasonably ask why you should use Markdown files. You can write your document and format them any way you like, and you can export your document to different file formats if you wish. For short documents that you only need to format once and to one file format, you do not need Markdown. I will argue that Markdown is still an excellent choice for such documents, but it is for more advanced applications where it really shines.

For applications that are just as easy to handle with a WYSIWYG editor, plain text can be a better choice in situations where you need to share documents with others. A de facto file format for this is Word files, but not everyone has Word. I don’t. I can import Word files into Pages, which I have, and export to Word, but I don’t know what that does to the formatting. Everyone has an editor that can work on plain files, and with a plain text file, you know exactly what you are editing. If the text and the formatting are separated, then someone with more artistic skills can handle the formatting while I can write the text. One argument for Word might be tracking of changes. This is an important feature, but with plain text files, you can put them under real version control, for example, GitHub, and that is superior to version tracking.

If you need your document in different formats, for example, you might need to include your text in a printed progress report and also have it on a web site, then you can export the document to as many file formats as you need. If you need different typography for the different file formats, you might have to do substantial manual work. You might need to change all the document styles by hand, and in the numerous occasions where you need to make changes to your text, you need to change the styles for each file format more than once. If you separate style and text, you avoid this problem altogether.

Using a markup language to annotate your text makes it easier for you to distinguish between the semantic structure of a document and how it is formatted. In the Markdown document, you markup where headers and lists are, for example, but not how these should be formatted in the final output. The formatting styles are held in different files and you can easily transform your Markdown input into all the output file formats and styles you need. Furthermore, someone else can work on the style specification while you concentrate on the text. Your Markdown doesn’t have to be in a single file either. You can split it into as many as you want, and then different authors can work on separate pieces of the text without worrying about how to merge files afterward. With version control, you can even work on the same file in parallel up to a point.

Separating Semantics from Formatting

Most documents have a semantic structure. Texts consist of chapters and sections, plain text and emphasized text, figures and citations, quotes, and lists. When we read a document, these semantic elements are visualized by different fonts, bold and italic text, different font sizes, and we do not directly see the semantic structure. Because we don’t immediately see the structure, it is easy to forget that it is there.

Most word processors separate semantics from formatting. If you take care to use the formatting section when working on a Word document, then the semantic information needed to change styles, that is, the visual representation of all semantic units (e.g., headings) is readily available. Separating the semantics of a document from its formatting is not an exclusive property of markup languages. However, when the separation of text and semantics is not enforced, there is a potential for error. If you decide to change the font size of level-two section headers, for example, you can easily do this, but you can equally easy highlight a single section header and reformat that, changing only that single header. That makes this particular header different from all the rest, and if you later modify the formatting of level-two headers, you won’t be changing this one header. Great if this is on purpose; not great if this is not what you wanted.

With WYSIWYG editors, you can separate semantics from formatting, but it is easy to break this separation. With markup languages, you can also define some text elements as special and their format different from related items, but you have to do this explicitly so you cannot easily do this by mistake. Keeping the core text consisting of semantic elements and separate from formatting is vital in many situations. If you want to translate your text into both paper documents and web pages, you typically want the format to be different in the two resulting documents. If the core text only contains the semantic structure, this is quickly done, by having a different mapping from semantic elements to formatting information, typically called templates or stylesheets (see Chapter 9). With different stylesheets for different output formats, the formatting is tied to the output text rather than the input text (see Figure 2-1).
../images/486315_1_En_2_Chapter/486315_1_En_2_Fig1_HTML.png
Figure 2-1

You can translate the text in multiple documents, or multiple chapters that should be merged into a single document. You combine these with templates for formatting the documents, and using Pandoc you can combine it all to produce the documents you want.

Explicitly representing the semantic elements in the text, rather than implicitly through how the text is formatted, is also essential if you want to automatically make a table of contents or lists of figures and tables. If all sections are marked up explicitly as sections, with headers at different section levels, any tool can scan your document and identify these. If the tools had to guess at the semantic meaning of text elements, based on how the text was formatted, this would be a much harder task.

Using WYSIWYG word processors doesn’t prevent you from structuring your documents as semantic units—they usually support this—but having an explicit markup language makes it much easier to enforce.

Preprocessing Documents

If your documents are in plain text, you also get a lot of options for how to process your document before you format it into a final output. There are a large number of tools that will work well with plain text and let you preprocess your documents.

Preprocessing documents often require a few programming skills, so it might not be the first thing you want to worry about if you are only interested in writing text, but since the option is there, you can write your text without worrying about processing it initially, and add such steps later.

I write a lot about R programming, and in those books, I have a lot of code examples. Here, I use another preprocessor, one that lets me evaluate the code when processing the documents so I know that all the code examples work and so I can get the output of running code inserted into the documents automatically before I create the output formats.

Preprocessing your documents adds some complications to how you format your text, but the complexities are only there when you need them. If you do not need a preprocessor, then you can ignore that they exist altogether. If you do need preprocessing, then read Chapter 10.

Why Markdown?

There are many different markup languages you can use. HTML (hypertext markup language) is used for web pages. TeX and LaTeX are used for many kinds of text documents but are especially powerful for typesetting mathematics. Markdown is what we do on in this book.

What makes Markdown particularly pleasant to work with is its simplicity. In HTML, for example, you need to structure your text using tags that enclose every paragraph, every header, every list, and so on. When you edit an HTML document, it is hard to separate the annotations from just the text you want to write. LaTeX has the same problem. The annotation of the text can be hard to ignore when you want to focus on writing.

Worse, if you write your documents in HTML or LaTeX, much of the text is markup codes that specify the formatting. How much, of course, depends on your document, but any markup instructions you make can make the text difficult to read.

Consider this Markdown document:
# This is a level one header
This is a paragraph
## This is a level two header
Here is a paragraph that is followed by
  * an unnumbered
  * list
1. and a numbered
2. list that is
3. three items long

I hope you will agree that the markups here are minimal and that they do not get in the way of reading or writing the text.

For comparison, the HTML version of the same text looks like this:
<h1>This is a level one header</h1>
<p>This is a paragraph</p>
<h2>This is a level two header</h2>
<p>Here is a paragraph that is followed by</p>
<ul>
  <li>an unnumbered</li>
  <li>list</li>
</ul>
<ol type="1">
  <li>and a numbered</li>
  <li>list that is</li>
  <li>three items long</li>
</ol>

It is not terribly complicated, and after looking at it a bit, you can certainly follow the structure of a document. It is far from as clean as the Markdown file.

The LaTeX version is slightly easier to read than the HTML file, but there are still several formatting instructions that get in the way of just writing.
section{This is a level one header}
This is a paragraph
subsection{This is a level two header}
Here is a paragraph that is followed by
egin{itemize}
item an unnumbered
item list
end{itemize}
egin{enumerate}
item and a numbered
item list that is
item three items long
end{enumerate}

Markdown is designed so you can annotate your text with semantic information with little annotation clutter. It is designed such that reading the input text is almost as easy as reading the formatted text. With Markdown you don’t have quite the same power to control your formatting as you do in a language like LaTeX, but the simplicity of Markdown more than makes up for it.

Why Pandoc?

Since Markdown is just a language for adding structure to a text, it is not tied to any particular tool. You can use any Markdown-aware software when you want to process your documents. Many blogging platforms will let you write your text in Markdown and automatically format it for you. Translating Markdown into HTML was, after all, one of the primary motivations for the language. Now, many text editors also support Markdown and will support formatting in Markdown and exporting to various file formats, usually with various formatting and style choices determining what your output files will look like.

If your editor can export to different file formats and in different styles, then that is obviously the easiest way for you to export your Markdown text. With Pandoc, however, you have a lot of power over how your documents should be processed. Pandoc is vastly more versatile than any Markdown-aware text editor that I am aware of.

If you want to create a simple document with no fluff, it is easy to do so with Pandoc, but easier to do from inside your editor. Try using Pandoc for simple cases though, so you get familiar with the tool. When you get into serious writing, and you want full control of how your final documents will look, then you need the power of Pandoc. The learning curve can be steep, but if you are familiar with using Pandoc for simple documents, then you have a foundation to build on when you explore advanced features.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset