© Thomas Mailund 2019
T. MailundIntroducing Markdown and Pandochttps://doi.org/10.1007/978-1-4842-5149-2_10

10. Preprocessing

Thomas Mailund1 
(1)
Aarhus N, Denmark
 

Markdown is just a plain text document, and you can do any rewriting of that text before you pass it through Pandoc. Any rewriting of the text before you give it to Pandoc is called preprocessing . Pandoc will read from standard input, so we can pipe the result of preprocessing into it on the command line (see Figure 10-1).

Assuming that the preprocessor takes the input file as input and that it writes its output to standard out, then a pipeline can look like this:
preprocessor infile.md |
   pandoc --from markdown ... -o outfile

You need to tell Pandoc that it is getting Markdown as input if it reads it from standard in, and you do this with the --from option .

The preprocessor can do whatever you want it to as long as it outputs a file that Pandoc can process. The output does not need to be Markdown—you can change the --from option if it is not—but it must be a file in a format that Pandoc can read. I will use Markdown as my output in the following.
../images/486315_1_En_10_Chapter/486315_1_En_10_Fig1_HTML.png
Figure 10-1

Document formatting pipeline with a preprocessing step

Examples

In the following examples, I will use GPP1 for the first two and Python2 for the last. GPP is a preprocessor with somewhat limited functionality, but for including files and for selectively including or excluding segments of a file, it works excellently. Getting Python to do the same is additional work. On the other hand, since Python is a general-purpose programming language, we can get it to do whatever we want with the input document.

Including Files

One use for a preprocessor is to have some information we can reuse in some files and another document-specific input—like a document’s body—in another file. That is the idea with templates, but there are other cases we might have such a setup.

Imagine that you are teaching a class and hand out exercises every week. Some information, such as the name of the class and the name of the instructor, do not change from week to week but other information does, for example, the week number.

We can make a file header.yml with the general information
class: Markdown and Pandoc
instructor: Thomas Mailund

The header here is, of course, artificially simple. You only want to include a file that is of some complexity, but the example shows the principle.

For a specific week, we can then specify the week information, for example, the week number and the actual exercises for that week. Here is a file; let us call it exercises.md . It holds the exercises for week 14 of the class.
---
#include "header.yml"
week: Week 14
---
# This is an exercise
Do something difficult
# This is another exercises
Do something even more difficult

The #include "header.yml " is where the preprocessor does its thing.

Notice that the three dashes delimiting the YAML specification are not in the header.yml . If it was then couldn’t include it and still set the variable week in the exercises.md file . When we include it into the YAML header, we can combine the general variables set in header.yml with the file-specific variables.

If we pipe the document through the preprocessor
gpp < exercises.md
we get this result:
---
class: Markdown and Pandoc
instructor: Thomas Mailund
week: Week 14
---
# This is an exercise
Do something difficult
# This is another exercises
Do something even more difficult
We can combine this with a template:
documentclass{article}
usepackage{hyperref}
itle{$class$: $week$}
author{$instructor$}
egin{document}
maketitle
end{document}
Combining the preprocessor and Pandoc now lets us build a document with our exercises.
gpp exercises.md |
    pandoc --template exercises.tex
        --from markdown
        -o exercises.pdf

Conditional Inclusion

Continuing with the exercise example, we could imagine that you have TAs for your class and you want to give them solutions to the exercise. It is easier to have the solutions in the same document as the exercises, but you don’t want to hand the solutions to your student. So, what you want is a way to include the solutions when you make documents to the TAs and exclude them otherwise. This is something GPP is excellent at as well.

You can test if a variable is defined using #ifdef. A variable here should not be confused with the variables that Pandoc works with. Remember that the preprocessor sees the document before Pandoc and does not communicate with Pandoc other than piping its output into it.

If we want to include or exclude a block of text, we can put them between #ifdef and #endif. We can do that for the solutions to our exercises:
---
#include "header.yml"
week: Week 14
---
# This is an exercise
Do something difficult
#ifdef SOLUTIONS
This is the solution to the exercise
#endif
# This is another exercises
Do something even more difficult
#ifdef SOLUTIONS
This is the solution to the exercise
#endif
If you build a document as the preceding one, you will not get the solutions in the output. To get them, you need to define SOLUTIONS . You can do this in the file with a #define statement, but for this particular application, we might as well give them to gpp on the command line. Here we can use the option -D. This command line will build a PDF that contains both the exercises and the solutions.
gpp -DSOLUTIONS week14_exercises.md |
    pandoc --template exercises.tex
    --from markdown
    -o week14_exercises_solutions.pdf

Running Code

Leaving the exercises, imagine that you are writing a book about programming and you have code examples. You want to show the result of running the code, so you want to evaluate all your code and insert the result into your document.

For example, you have the code
```python
for i in range(10):
    print(i, end = ' ')
```
```python
for i in range(10):
    print(-i, end = ' ')
```

and you want the first code block to be followed by the numbers 0–9 and the second from 0 to -9.3

This Python code iterates over all lines in the input. It uses sys.stdin to read the input, so you must pipe input to it and not call it with a file name. For each line, it checks if it is a code block line, that is, whether it starts with three backtics. If it is, and it starts with python, then it starts collecting lines until it sees the end of the block. When it gets there, it evaluates the python code, using exec. This function will execute the code producing any output the code prints—which is what we want here. Since we are using exec, functions and variables defined in earlier block scan be used in later blocks.
from sys import stdin
def main():
    exec_env = {}
    incode = False
    codeblock = []
    for line in stdin:
        print(line, end=“)
        if line.startswith("```python"):
            incode = True
            continue
        if incode:
            if line.startswith("```"):
                exec("".join(codeblock), exec_env)
                incode = False
                codeblock = []
                continue
            codeblock.append(line)
if __name__ == "__main__":
    main()
You can call the preprocess like this
python3 evalpy.py < eval-python.md
and get this result:
```python
for i in range(10):
    print(i, end = ' ')
```
0 1 2 3 4 5 6 7 8 9
```python
for i in range(10):
    print(-i, end = ' ')
```
0 -1 -2 -3 -4 -5 -6 -7 -8 -9

Exercises

If you have gpp installed, then preprocess a document such that you use a flag that gets you a different output when you create HTML and when you create LaTeX output. You have to explicitly set variables to do this, but see the next chapter for how to handle output formats in filters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset