DocBook contains markup for the usual variety of front and back matter necessary for books and articles: indexes, glossaries, bibliographies, and tables of contents. In many cases, these components are generated automatically, at least in part, from your document by an external processor, but you can create them by hand, and in either case, store them in DocBook.
Some forms of back matter, such as indexes and glossaries, usually require additional markup in the document to make generation by an application possible. Bibliographies are usually composed by hand like the rest of your text, unless you are automatically selecting bibliographic entries out of some larger database. Our principal concern here is to acquaint you with the kind of markup you need to include in your documents if you want to construct these components.
Front matter, like the table of contents, is almost always generated
automatically from the text of a document by the processing application.
If you need information about how to mark up a table of contents in
DocBook, please consult the reference page for
toc
.
In some highly structured documents such as reference manuals, you can automate the whole process of generating an index successfully without altering or adding to the original source. You can design a processing application to select the information and compile it into an adequate index. But this is rare.
In most cases—and even in the case of some reference manuals—a useful index still requires human intervention to mark occurrences of words or concepts that will appear in the text of the index.
DocBook distinguishes two kinds of index markers: those that are singular and result in a single page entry in the index itself, and those that are multiple and refer to a range of pages.
You put a singular index marker where the subject it refers to actually occurs in your text:
<para> <indexterm><primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> The tiger is a very large cat indeed. </para>
This index term has two levels,
primary
and secondary
. They
correspond to an increasing amount of indented text in the resultant
index. DocBook allows for three levels of index terms, with the third
labeled tertiary
.
There are two ways that you can index a range of text. The first is to put index marks at both the beginning and end of the discussion. The mark at the beginning asserts that it is the start of a range, and the mark at the end refers back to the beginning. In this way, the processing application can determine what range of text is indexed. Here’s the previous tiger example recast as starting and ending index terms:
<para> <indexterm xml:id="tiger-desc" class="startofrange"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> The tiger is a very large cat indeed… </para> ⋮ <para> So much for tigers<indexterm startref="tiger-desc" class="endofrange"/>. Let's talk about leopards. </para>
Note that the mark at the start of the range identifies
itself as the start of a range with the class
attribute, and provides an xml:id
. The mark at the end of the range
points back to the start.
Another way to mark up a range of text is to specify
that the entire content of an element, such as a chapter or section,
is the complete range. In this case, all you need is for the index
term to point to the xml:id
of
the element that contains the content in question. The zone
attribute of
indexterm
provides this functionality.
One of the interesting features of this method is that the actual index marks do not have to occur anywhere near the text being indexed. It is possible to collect all of them together, for example, in one file, but it is not invalid to have the index marker occur near the element it indexes.
Suppose the discussion of tigers in your document comprises a
whole text object (such as a sect1
or a chapter
) with an xml:id
value of
tiger-desc
. You can put the following tag anywhere
in your document to index that range of text:
<indexterm zone="tiger-desc"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm>
DocBook also contains markup for index hits that point
to other index hits (e.g., “See Cats, big” or “See also Lions”). See
the reference pages for see
and
seealso
.
After you have added the appropriate markup to your document, an external application can use this information to build an index. The resultant index must have information about the page numbers on which the concepts appear. It’s usually the document formatter that builds the index. In this case, it may never be instantiated in DocBook.
However, there are applications that can produce an
index marked up in DocBook. The following example includes some one-
and two-level indexentry
elements
(which correspond to the primary and secondary levels in the
indexterm
s themselves) that begin with the letter
D:
<index><title>Index</title> <indexdiv><title>D</title> <indexentry> <primaryie>database (bibliographic), 253, 255</primaryie> <secondaryie>structure, 255</secondaryie> <secondaryie>tools, 259</secondaryie> </indexentry> <indexentry> <primaryie>dates (language specific), 179</primaryie> </indexentry> <indexentry> <primaryie>DC fonts, <emphasis>172</emphasis>, 177</primaryie> <secondaryie>Math fonts, 177</secondaryie> </indexentry> </indexdiv> </index>
The structure of indexentry
is parallel to
the structure of indexterm
. Where
indexterm
has primary
,
secondary
, tertiary
,
see
, and seealso
,
indexentry
has primaryie
,
secondaryie
,
tertiaryie
, seeie
, and
seealsoie
.
A glossary
, like a
bibliography
, is often constructed by hand. However,
some applications are capable of building a skeletal index from glossary
term markup in the document. If all of your terms are defined in some
glossary database, it may even be possible to construct the complete
glossary automatically.
To enable automatic glossary generation, or simply
automatic linking from glossary terms in the text to glossary entries,
you must add markup to your documents. In the text, you mark up a term
for compilation later with the inline glossterm
tag.
This tag can have a linkend
attribute whose value is the ID of the actual entry in the
glossary.[1]
For instance, if you have this markup in your document:
<glossterm linkend="xml">Extensible Markup Language</glossterm> is a new standard…
your glossary might look like this:
<glossary><title>Example Glossary</title> ⋮ <glossdiv><title>E</title> <glossentry xml:id="xml"><glossterm>Extensible Markup Language</glossterm> <acronym>XML</acronym> <glossdef> <para>Some reasonable definition here.</para> <glossseealso otherterm="sgml"> </glossdef> </glossentry> </glossdiv> ⋮ </glossary>
Note that the glossterm
tag
reappears in the glossary to mark up the term and distinguish it from
its definition within the glossentry
.
The xml:id
that the glossentry
referenced in the text is the
ID of the glossentry
in the glossary
itself. You can use the link between source and glossary to create a
link in electronic formats, as we have done with the HTML and PDF forms
of the glossary in this book.
You can use the baseform
attribute on glossterm
and
firstterm
when the term marked up in context is in a
different form, for example, plural. Here is an example:
<para> Using <glossterm baseform="DTD">DTDs</glossterm> can be hazardous to your sanity. </para>
There are two ways to set up a bibliography in DocBook:
you can have the data raw or
cooked. When you use “raw” data, you
wrap your entry in the biblioentry
element and mark
up each item individually. The processor determines the display order
and supplies punctuation. When you
use “cooked” data, you wrap your entry in the bibliomixed
and provide the data in the
order in which you want it displayed, and you include the
punctuation.
Here’s an example of a raw bibliographical item, wrapped in the
biblioentry
element:
<biblioentry xreflabel="Kites75"> <authorgroup> <author><firstname>Andrea</firstname><surname>Bahadur</surname></author> <author><firstname>Mark</firstname><surname>Shwarek</surname></author> </authorgroup> <copyright><year>1974</year><year>1975</year> <holder>Product Development International Holding N. V.</holder> </copyright> <isbn>0-88459-021-6</isbn> <publisher> <publishername>Plenary Publications International, Inc.</publishername> </publisher> <title>Kites</title> <subtitle>Ancient Craft to Modern Sport</subtitle> <pagenums>988-999</pagenums> <seriesinfo> <title>The Family Creative Workshop</title> <seriesvolnums>1-22</seriesvolnums> <editor> <firstname>Allen</firstname> <othername role=middle>Davenport</othername> <surname>Bragdon</surname> <contrib>Editor in Chief</contrib> </editor> </seriesinfo> </biblioentry>
The “raw” data in a biblioentry
is comprehensive to a fault—there
are enough fields to suit a host of different bibliographical styles,
and that is the point. An abundance of data requires processing
applications to select, punctuate, order, and format the bibliographical data, and it is unlikely
that all the information provided will actually be output.
All the “cooked” data in a bibliomixed
entry in a bibliography, on the
other hand, is intended to be presented to the reader in the form and
sequence in which it is provided. It even includes punctuation between
the fields of data:
<bibliomixed> <bibliomset relation="article"> <surname>Walsh</surname>, <firstname>Norman</firstname>. <title role="article">Introduction to Cascading Style Sheets</title>. </bibliomset> <bibliomset relation="journal"> <title>The World Wide Web Journal</title> <volumenum>2</volumenum><issuenum>1</issuenum>. <publishername>O'Reilly & Associates, Inc.</publishername> and <corpname>The World Wide Web Consortium</corpname>. <pubdate>Winter, 1996</pubdate></bibliomset>. </bibliomixed>
Clearly, these two ways of marking up bibliographical entries are suited to different circumstances. You should use one or the other for your bibliography, not both. Strictly speaking, mingling the raw and the cooked may be “kosher” as far as the schema is concerned, but it will almost certainly cause problems for most processing applications.
[1] Some formatters are able to establish the link by examining the content of the terms and the glossary. In that case, the author does not need to make explicit links.