Chapter 4. Development Tools

An amazing variety of development tools is available for Linux. Everyone should be familiar with a few of the important ones.

Linux distributions include many solid, proven development tools; most of the same tools have been included in Unix development systems for years. These tools are neither flashy nor fancy; most of them are command-line tools without a GUI. They have proved themselves through years of use, and it will be worth your while to learn them.

If you are already familiar with Emacs, vi, make, gdb, strace, and ltrace, you are not likely to learn anything new here. However, in the remainder of the book, we assume that you are comfortable with a text editor. Also, nearly all free Unix and Linux source code is built with make, and gdb is one of the most common debuggers available on Linux and Unix. The strace utility (or a similar utility called trace or truss) is available on most Unix systems; the ltrace utility was originally written for Linux and is not available on many systems (as of this writing).

This should not be taken to mean that there are no GUI development tools available for Linux; in fact, quite the reverse is true. There are so many available that the variety can be overwhelming.

At the time of writing, two integrated development environments (IDEs) you might want to consider (and that are likely to be included with your distribution) are KDevelop (http://kdevelop.org/), a part of the KDE desk-top environment, and Eclipse (http://eclipse.org/), a Java-based cross-platform environment originally developed by IBM and now maintained by a large consortium. However, using these tools is beyond the scope of this book, and both come with full documentation.

Even though multiple IDEs are available for Linux, they have not been as popular as IDEs have been on other platforms. Even when IDEs are in use, a common rule of thumb when writing Open Source software is that your project should build without the IDE in order to make it possible for developers not comfortable with your choice of IDE to contribute to the project. KDevelop supports this by helping you build projects that use the standard Automake, Autoconf, and Libtool tools used to build many Open Source software projects.

The standard Automake, Autoconf, and Libtool tools themselves are important development tools. They were created to allow you to design your application in such a way that it can be mostly automatically ported to many operating systems. Because these tools are so complex, they are beyond the scope of this book. Also, these tools are changed regularly, and an electronic version of GNU Autoconf, Automake, and Libtool [Vaughan, 2000] is maintained online at http://sources.redhat.com/autobook/.

Editors

Unix developers have traditionally held strong and diverse preferences, especially about editors.

Many programmers’ editors are available for you to try; the two most common are vi and Emacs. Both have more power than they appear to have at first glance, both have a relatively steep learning curve—and they are radically different. Emacs is large; it is really an operating environment of its own. vi is small and is designed to be one piece of the Unix environment. Many clones and alternative versions of each editor have been written, and each has its own following.

Tutorials on vi and Emacs would take far too much space to include in this book. The excellent A Practical Guide to Red HatR LinuxR 8 [Sobell, 2002] includes a detailed chapter on each editor. O’Reilly has published an entire book on each editor: Learning GNU Emacs [Cameron, 1996] and Learning the vi Editor [Lamb, 1990]. Here, we compare only Emacs and vi and tell you how to get online help for each.

Emacs includes a comprehensive set of manuals that explains not only how to use Emacs as an editor, but also how to use Emacs to read and send email and Usenet news, to play games (its gomoku game is not bad), and to run shell commands. In Emacs, you can always execute internal Emacs commands, even those commands that are not bound to keys, by typing the entire name of the command.

In contrast, the documentation available for vi is less generous and less well known. It is exclusively an editor, and many powerful commands are bound to single keystrokes. You switch back and forth between a mode in which typing standard alphabetic characters causes them to be inserted in your document and a mode in which those alphabetic characters are commands; for example, you can use the h, j, k, and l keys as arrow keys to navigate your document.

Both editors allow you to create macros to make your work easier, but their macro languages could hardly be more different. Emacs has a complete programming language called elisp (Emacs Lisp), which is closely related to the Common Lisp programming language. The original vi has a more spartan, stack-based language. Most users merely map keys to simple, one-line vi commands, but those commands often execute programs outside vi to manipulate data within vi. Emacs Lisp is documented in a huge manual that includes a tutorial; documentation for the original vi’s language is relatively sparse.

Some editors allow you to mix and match functionality. You can use Emacs in a vi mode(called viper) that allows you to use the standard vi commands, and one of the vi clones is called vile—“vi like Emacs.”

Emacs

Emacs comes in several flavors. The original Emacs editor was written by Richard Stallman, of Free Software Foundation fame. For years, his GNU Emacs has been the most popular version. Recently, a more graphic-environment-aware variant of GNU Emacs, called XEmacs, has also become popular. XEmacs started life as Lucid Emacs, a set of enhancements to GNU Emacs managed by the now-defunct Lucid Technologies that was intended to be folded back into the official GNU Emacs. Technical differences prevented the teams from merging their code. The two editors remain highly compatible, however, and programmers on both teams regularly borrow code from each other. Because these versions are so similar, we refer to both of them as Emacs.

The best way to become comfortable with the Emacs editor is to follow its tutorial. Run emacs and type ^h t (think “control-help, tutorial”). Type ^x^c to exit Emacs. The tutorial will teach you how to get more information on Emacs. It will not teach you how to get at the Emacs manual that is distributed with Emacs. For that, use ^h i (control-help, info).

Although its user interface may not be as flashy as those of a graphical IDE, Emacs does have powerful features that many programmers want. When you use Emacs to edit C code, for example, Emacs recognizes the file type and enters “C mode,” in which it recognizes C’s syntax and helps you to recognize typos. When you run the compiler from within Emacs, it will recognize error and warning messages, and take you straight to the line at which each error was found when you type a single command, even if it has to read in a new file. It also provides a debugging mode that keeps the debugger in one window and follows the code you are debugging in another window.

vi

If you are a touch typist and like to keep your fingers on the home row,[1] you may appreciate vi, because its command set was designed to minimize finger movement for touch typists. It was also designed for Unix users; if you are familiar with sed or awk or other Unix programs that use standard regular expressions, using ^ to go to the beginning of a line and $ to go to the end of one will feel perfectly natural.

Unfortunately, vi can be harder to learn than Emacs because, although there are vi tutorials similar to the standard Emacs tutorial available, there is no standard way to execute the tutorial from any version of vi. However, most versions, including versions shipped with common Linux distributions, support the :help command.

The most common version of vi, vim (“Vi IMproved”), has many of the features integrating development tools provided by Emacs, including syntax highlighting, automatic indentation, an expressive scripting language, and compiler error parsing.

Make

A mainstay of Unix development is make, a tool that makes it easy for you to describe how to compile programs. Although small programs may need only one command to compile their one source code file into one executable file, it is still easier to type make than to type gcc -o2 -ggdb -DSOME_ DEFINE -o foo foo.c. Furthermore, when you have lots of files to compile, and you have changed only a few source files, make will create new object files only when the relevant source files have been modified.

For make to perform this magic, you need to describe all the files in a Makefile. Here is an example:

 1:  # Makefile
 2:
 3:  OBJS = foo. o bar. o baz. o
 4:  LDLIBS = -L/usr/local/lib/ - lbar
 5:
 6:  foo: $(OBJS)
 7:          gcc -o foo $(OBJS) $(LDLIBS)
 8:
 9:  install: foo
10:          install -m 644 foo /usr/bin
11: .PHONY: install
  • Line 1 is a comment; make follows the common Unix tradition of delimiting comments with a # character.

  • Line 3 defines a variable called OBJS as foo.o bar.o baz.o.

  • Line 4 defines another variable, LDLIBS.

  • Line 6 starts the definition of a rule, which states that the file foo depends on (in this case, is built from) the files whose names are contained in the variable OBJS. foo is called the target, and $(OBJS) is called the dependency list. Note the syntax for variable expansion: You wrap the variable name in $(...).

  • Line 7 is a command line that tells you how to build the target from the dependency list. There may be multiple command lines, and the first character in the command line must be a tab.

  • Line 9 is an interesting target. It does not actually try to make a file called install; instead (as you can see in line 10), it installs foo in /usr/bin using the standard install program. But this line brings up an ambiguity in make: What if a file named install exists and it is newer than foo? In that case, when you run the command make install, make says, "install' is up to date" and quits.

  • Line 11 tells make that install is not a file, and that it should ignore any file named “install” when computing the install dependency. Thus, if the install dependency is invoked (we shall see how to do that later), the command in line 10 will always be invoked. .PHONY is a directive, which alters make’s operation; it tells make, in this case, that the install target is not the name of a file. PHONY targets are often used to take actions such as installation or making a single target name that relies on several other targets all being built, like this:

    all: foo bar baz
    .PHONY: all
    

    Unfortunately, .PHONY is not supported by some versions of make. A less obvious, less efficient, but more portable way of doing the same things is

    all: foo bar baz FORCE
    FORCE:
    

    This works only if there is no file named “FORCE”.

Items in dependency lists may be file names, but, as far as make is concerned, they are other targets. The foo item in the install dependency list is a target. When make attempts to resolve the install dependency, it sees that it first has to resolve the foo dependency. To resolve the foo dependency, it has to resolve the foo. o, bar. o, and baz.o dependencies.

Note that there are no lines that explicitly tell make how to build foo.o, bar.o, or baz.o. You certainly do not create these files directly in your editor of choice! make provides implied dependencies that you do not have to write. If you have a dependency on a file that ends in .o and you have a file that has the same name except that it ends in .c, make assumes that the object file depends on the source file. Built-in suffix rules provided with make allow you to greatly simplify many Makefiles, and you can write your own suffix rules (explained on page 34) when the built-in rules do not meet your needs.

By default, make exits as soon as any command it runs fails (returns an error). There are two ways to get around this, if you wish to.

The -k argument causes make to build as much as possible, without stopping as soon as a command invocation returns an error. It is useful, for example, when porting; you can build as many object files as possible, then port the files that failed to build without having to wait for intermediary files to build in the meantime.

If you know that one command will always return an error, but you wish to ignore the error condition, you can use some shell magic. The /bin/false command always returns an error, so the command

/bin/false

will always cause make to abort its current run unless the -k option is in use. However,

any_ command || /bin/true

will never cause make to abort its current run; if any_ command returns false, then the shell will run /bin/true and return its exit code, which is guaranteed to be success.

Make interprets unrecognized command-line arguments that do not start with a dash (-)[2] as targets to build. So make install will cause make to try to satisfy the install target. If the foo target is not up to date, make will first satisfy the foo dependency by building it, and then it will install it. If you need to build a target that starts with a dash, you will need to precede the target name with a separate double-dash (--) argument.

Complex Command Lines

Each command line is executed in its own subshell, so cd commands in a command line affect only the line on which they are written. You can extend any line in a Makefile over multiple lines by using backslash extension: If you put a as the final character on a line, the line after it will be considered to be part of it, joined to it by a space. Command lines often look like this:

 1:        cd some_directory ; 
 2:          do this to file $(FOO); 
 3:          do that
 4:        cd another_directory ; 
 5:          if [ -f some_file] ; then
 6:            do something else ; 
 7:          done ; 
 8:          for i in * ; do 
 9:            echo $$i >> some_ file ; 
10:         done

There are only two lines in that fragment, as far as make is concerned. The first command line starts on line 1 and continues through line 3; the second command line starts on line 4 and continues through line 10. There are several points to note here.

  • another_ directory is relative not to some_directory, but rather to the directory in which make was run, because they are executed in different subshells.

  • The lines that constitute each command line are passed to the shell as a single line, so all the ; characters that the shell needs must be there, including the ones that are usually omitted in a shell script because the presence of newline characters implies them. For more information on shell programming, see Learning the bash Shell [Newham, 1995].

  • When you need to dereference a make variable, you can just dereference it normally (that is, $(VAR)), but when you need to dereference a shell variable, you need to escape the $ character by including it twice: $$i.

Variables

It often happens that you want to define a variable one component at a time. You might want to write something like this:

OBJS = foo.o
OBJS = $(OBJS) bar.o
OBJS = $(OBJS) baz.o

At this point, you expect OBJS to be defined as foo.o bar.o baz.o—but it is actually defined as $(OBJS) baz.o, because make does not expand variables until they are used.[3] If you do this, as soon as you reference OBJS in a rule, make will enter an infinite loop.[4] For this reason, many Makefiles have sections that look like this:

OBJS1 = foo.o
OBJS2 = bar.o
OBJS3 = baz.o
OBJS = $(OBJS1) $(OBJS2) $(OBJS3)

You will most often see variable declarations like the preceding one when a variable declaration would otherwise be too long for the programmer’s comfort.

Variable expansion brings up a typical issue that a Linux programmer is called on to decide. The GNU tools distributed with Linux are generally more capable than the versions of the tools included with other systems, and GNU make is no exception. The authors of GNU make created another way to do variable assignment that avoids this problem, but not every version of make understands GNU make’s alternative forms of variable assignment. Fortunately, GNU make can be built for any system to which you could easily port source code written on Linux, but do you want to force the people porting your code to other systems to use GNU make? If you do, you can use simple variable assignment:

OBJS : = foo.o
OBJS : = $(OBJS) bar.o
OBJS : = $(OBJS) baz.o

The := operator causes GNU make to evaluate the variable expression at assignment time, rather than wait to evaluate the expression when it is used in a rule. With this code, OBJS does indeed contain foo.o bar.o baz.o.

Simple variable assignment is often useful, but GNU make also has another assignment syntax that deals specifically with this problem, one straight from the C language:

OBJS := foo.o
OBJS += bar.o
OBJS += baz.o

Suffix Rules

This is another context in which you have to decide whether to write standard Makefiles or to use useful GNU extensions. Standard suffix rules are more limited than are the GNU pattern rules, but most situations can be handled sufficiently well by the standard suffix rules, and pattern rules are not supported by many other versions of make.

Suffix rules look like this:

.c.o:
        $(CC) -c $(CFLAGS) $(CPPFLAGS) -o $@ $<
.SUFFIXES: .c .o

This rule says (we sweep the details under the carpet) that make should, unless it is otherwise explicitly instructed, turn a.c file into a.o file by running the attached command line. Each .c file will be treated as though it were explicitly listed as a dependency of the respective .o file in your Makefile.

That suffix rule introduces another of make’s features: automatic variables. It is clear that you need a way to substitute the dependency and target into the command line. The automatic variable $@ stands for the target, $< stands for the first dependency, and $^ stands for all the dependencies.

Several other automatic variables are available and documented in the make manual. All automatic variables can be used in normal rules, as well as in suffix and pattern rules.

The final line of the example introduces another directive. .SUFFIXES tells make that .c and .o are suffixes that make should use to find a way to turn the existing source files into the desired targets.

Pattern rules are more powerful and, therefore, slightly more complex than suffix rules; it would take too long to cover them in detail here. The equivalent pattern rule to the preceding suffix rule is

%.o : %.c
        $(CC) -c $(CFLAGS) $(CPPFLAGS) -o $@ $<

If you want to know more about make, see Managing Projects with make [Oram, 1993]. GNU make also includes an excellent and easy-to-read manual in Texinfo format, which you can read online, print out, or order as a book from the Free Software Foundation.

Most large Open Source projects use the Automake, Autoconf, and Libtool tools. These tools are essentially collections of knowledge about the peculiarities of different systems and of community standards for how to build projects, so that you have to write only the bits that are actually specific to your project. For example, Automake writes “install” and “uninstall” targets, Autoconf automatically determines the capabilities of the system and configures the software to match the system, and Libtool deals with differences in how shared libraries are managed between different systems. Documenting these three tools is an entire book of its own, GNU Autoconf, Automake, and Libtool [Vaughan, 2000]; Linux Application Development provides the background you need in order to read and use GNU Autoconf, Automake, and Libtool.

The GNU Debugger

Gdb is the Free Software Foundation’s debugger. It is a good command-line debugger, on which several tools have been built, including Emacs’ gdb mode, the graphical Data Display Debugger (DDD),[5] and built-in de-buggers in several graphical IDEs. We cover only gdb in this section.

Start gdb by running gdb progname. Gdb will not search the PATH looking for the executable file. Gdb will load the executable’s symbols and then prompt you for what to do next.

There are three ways to inspect a process with gdb:

  • Use the run command to start the program normally.

  • Use the attach command to start inspecting an already-running process. When you attach to a process, the process will be stopped.

  • Inspect an existing core file to determine the state of the process when it was killed. To inspect a core file, start gdb with the command gdb progname corefile.

Before you run a program or attach to an already-running program, you can set breakpoints, list source code, and do anything else that does not necessarily involve a running process.

Gdb does not require that you type entire command names; r suffices for run, n for next, s for step. Furthermore, to repeat the most recent command, simply hit Return. This makes single-stepping easy.

A short selection of useful gdb commands is included here; gdb includes a comprehensive online manual in GNU info format (run info gdb) that explains all of gdb’s options in detail in a tutorial format. Programming with GNU Software [Loukides, 1997] contains a good detailed tutorial on using gdb. Gdb also includes extensive online help available from within gdb; access it with the help command. Specific help on each command is available with help commandname or help topic.

Just like shell commands, gdb commands may take arguments. We use “call help with an argument of command” to mean the same as “type help command”.

Some gdb commands also take format identifiers to identify how to print values. Format identifiers immediately follow the command name and are separated from the command name by a slash. Once you have chosen a format, you do not have to use it each time you repeat the command; gdb will remember the format you chose as the default.

Format identifiers are separated from commands by a / character and are composed of three elements: a count, a format letter, and a size letter. The count and size letters are optional; count defaults to 1, and the size has reasonable defaults based on the format letter.

The format letters are o for octal, x for hexadecimal, d for decimal, u for unsigned decimal, t for binary, f for floating-point, a for address, i for instruction, c for character, and s for string.

The size letters are b for byte, h for half word (2 bytes), w for word (4 bytes), and g for giant (8 bytes).

attach, at

Attach to an already-running process. The only argument is the pid of the process to which to attach. This stops the processes to which you attach, interrupting any sleep or other interruptible system call in progress. See detach.

backtrace, bt, where, w

Print a stack trace.

break, b

Set a breakpoint. You can specify a function name, a line number of the current file (the file containing the currently executing code), a filename: linenumber pair, or even an arbitrary address with *address. Gdb assigns and tells you a unique number for each breakpoint. See condition, clear, and delete.

clear

Clear a breakpoint. Takes the same arguments as break. See delete.

condition

Changes a breakpoint specified by number (see break) to break only if a condition is true. The condition is expressed as an arbitrary expression.

 

(gdb) b 664
Breakpoint 3 at 0x804a5c0: file ladsh4.c, line 664.
(gdb) condition 3 status == 0

delete

Clear a breakpoint by number.

detach

Detach from the currently attached process.

display

Display the value of an expression every time execution stops. Takes the same arguments (including format modifiers) as print. Prints a display number that can be used later to cancel the display. See undisplay.

help

Get help. Called with no argument, provides a summary of the help available. Called with another command as an argument, provides help on that command. Extensively cross-referenced.

jump

Jump to an arbitrary address and continue execution there. The address is the only argument, and it can be specified either as a line number or as an address specified as *address.

list, l

With no argument, list first lists the 10 lines surrounding the current address. Subsequent calls to list list subsequent sections of 10 lines. With an argument of -, lists the previous 10 lines.

 

With a line number, lists the 10 lines surrounding that line. With a filename: linenumber pair, lists the 10 lines surrounding that line. With a function name, lists the 10 lines surrounding the beginning of the function. With an address specified as *address, specifies the 10 lines surrounding the code found at that address.

 

With two line specifications separated by commas, lists all the lines between the two specified lines.

next, n

Step to the next line of source code in the current function; make function calls without stepping. See step.

nexti

Step to the next machine language instruction; make function calls without stepping. See stepi.

print, p

Print the value of an expression in a comprehensible representation. If you have a char *c, the command print c will print the address of the string, and print *c will print the string itself. Printing structures will expand the structures. You can include casts in your expressions, and gdb will honor them. If the code was compiled with the -ggdb option, enumerated values and preprocessor definitions will be available for you to use in your expressions. See display.

 

The print command takes format identifiers, although with proper types and with typecasts, the format identifiers are rarely necessary. See x.

run, r

Run the current program from the beginning. The arguments to the run command are the arguments that would be used to run the program on the command line. Gdb will do shell-style globbing with * and [], and it will do shell-style redirection with <,>, and >>, but it will not do pipes or here documents.

 

With no arguments, run uses the arguments that were specified in the most recent run command, or in the most recent set args command. To run with no arguments after running with arguments, use the set args command with no extra arguments.

set

Gdb allows you to change the values of variables, like this:

 
(gdb) set a = argv[5]
 

Also, whenever you print an expression, gdb gives you a shorthand variable, like $1, that you can use to refer to it later. So if you had previously printed argv[5] and gdb had told you that it was $6, you could write the previous assignment as

 
(gdb) set a = $ 6
 

The set command also has many subcommands, far too numerous to list here. Use help set for more information.

step, s

Step the program instruction by instruction until it reaches a new line of source code. See next.

stepi

Execute exactly one machine language instruction; traces into function calls. See nexti.

undisplay

Without any argument, cancels all displays. Otherwise, cancels the displays whose numbers are given as arguments. See display.

whatis

Prints the data type of an expression given as its argument.

where, w

See backtrace.

x

The x command is like the print command, except that it is explicitly limited to printing the contents of an address in some arbitrary format. If you do not use a format identifier, gdb will use the most recently specified format identifier.

Tracing Program Actions

Two programs help you trace the actions that an executable is taking. Neither requires the source code; in fact, neither can make use of the source code. Both print out in symbolic, textual form a log of the actions being taken by a program.

The first, strace, prints out a record of each system call that the program makes. The second, ltrace, prints out a record of each library function that the program makes (and can optionally also trace system calls). These tools can be particularly useful for determining “what went wrong” in obvious failure cases.

For example, consider a system daemon that has been working for quite a while, but then starts exhibiting segmentation faults when you try to start it up. It is likely that the bug has been triggered by a change in some data files, but you do not know which one. The first step might be to run the system daemon under strace and look for the last few files that it opens before taking the segmentation fault, and examining those files to look for likely causes. Or consider another daemon that is unexpectedly taking lots of CPU time; you can run it under strace first, and then ltrace if strace doesn’t show clearly what it is doing, to understand what input or conditions are causing it to take an unexpected amount of CPU time.

Like gdb, strace and ltrace can either be used to run a program from beginning to end, or can attach to running programs. By default, both programs send their output to standard out. Both programs require that their own options come first, followed by the executable to run (when applicable), and if an executable is specified, any options to pass to that executable follow next.

Both programs provide a similar set of options:

-C or --demangle

In ltrace only, decode (or “demangle”) the names of library symbols into recognizable names. This strips leading underscore characters (many glibc functions are internally implemented with versions with leading underscores) and makes C++ library functions readable (C++ encodes type information into symbol names).

-e

In strace only, specify a subset of actions to print. There are many possible specifications described in the strace man page; the most commonly useful specification is -e trace= file, which traces only system calls involved in file I/O and manipulation.

-f

Attempt to “follow fork(),” trace child processes as well as possible. Note that the child process may run without being traced for a short time before strace or ltrace is able to attach to it and trace its actions.

-o filename

Instead of sending the output to standard out, store it in the file named filename.

-p pid

Instead of starting a new instance of a program, attach to the process ID specified in pid.

- S

In ltrace only, report system calls as well as library calls.

-v

In strace only, do not abbreviate large structures in system calls such as the stat() family of calls, termios calls, and others with large structures.

The manual pages for each of the utilities cover these options and others not mentioned here.



[1] If you are a qwerty touch typist, that is. Dvorak touch typists who use vi generally use lots of vi macros to make vi comfortable for them.

[2] The combined minus and hyphen character is often called a dash.

[3] Although this behavior may seem inconvenient, it is an important feature and not a bug. Not expanding variables is critically important for writing the generic suffix rules that create implied dependencies.

[4] Most versions of make, including the GNU version, which is distributed with Linux, will detect that they are in an infinite loop and quit with an error message.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset