Do you want to make your program faster? Well, of course you do. But first you should stop and ask yourself, "Do I really need to spend time making this program faster?" Recreational optimization can be fun,[2] but normally there are better uses for your time. Sometimes you just need to plan ahead and start the program when you're going on a coffee break. (Or use it as an excuse for one.) But if your program absolutely must run faster, you should begin by profiling it. A profiler can tell you which parts of your program take the most time to execute, so you won't waste time optimizing a subroutine that has an insignificant effect on the overall execution time.
Perl comes with a profiler, the Devel::DProf
module. You can use it to profile the Perl program in
mycode.pl by typing:
perl -d:DProf mycode.pl
Even though we've called it a profiler--since that's
what it does--the mechanism DProf
employs is the
very same one we discussed earlier in this chapter.
DProf
is just a debugger that records the time Perl
entered and left each subroutine.
When your profiled script terminates,
DProf
will dump the timing information to a file
called tmon.out. The dprofpp
program that came with Perl knows how to analyze
tmon.out and produce a report. You may also use
dprofpp as a frontend for the whole process with
the -p
switch (see described later).
Given this program:
outer(); sub outer { for (my $i=0; $i < 100; $i++) { inner() } } sub inner { my $total = 0; for (my $i=0; $i < 1000; $i++) { $total += $i } } inner();
the output of dprofpp is:
Total Elapsed Time = 0.537654 Seconds User+System Time = 0.317552 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 85.0 0.270 0.269 101 0.0027 0.0027 main::inner 2.83 0.009 0.279 1 0.0094 0.2788 main::outer
Note that the percentage numbers don't add up to 100. In fact, in this case, they're pretty far off, which should tip you off that you need to run the program longer. As a general rule, the more profiling data you can collect, the better your statistical sample. If we increase the outer loop to run 1000 times instead of 100 times, we'll get more accurate results:
Total Elapsed Time = 2.875946 Seconds User+System Time = 2.855946 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 99.3 2.838 2.834 1001 0.0028 0.0028 main::inner 0.14 0.004 2.828 1 0.0040 2.8280 main::outer
The first line reports how long the program took to run, from start to finish. The second line displays the total of two different numbers: the time spent executing your code ("user") and the time spent in the operating system executing system calls made by your code ("system"). (We'll have to forgive a bit of false precision in these numbers--the computer's clock almost certainly does not tick every millionth of a second. It might tick every hundredth of a second if you're lucky.)
The "user+system" times can be changed with command-line options
to dprofpp. -r
displays
elapsed time, -s
displays system time only, and
-u
displays user time only.
The rest of the report is a breakdown of the time spent in each
subroutine. The "Exclusive Times" line indicates that when subroutine
outer
called subroutine inner
,
the time spent in inner
didn't count towards
outer
's time. To change this, causing
inner
's time to be counted towards
outer
's, give the -I
option
to dprofpp.
For each subroutine, the following is reported:
%Time
, the percentage of time spent in this
subroutine call; ExclSec
, the time in seconds spent
in this subroutine not including those subroutines called from it;
CumulS
, the time in seconds spent in this
subroutine and those called from it; #Calls
, the
number of calls to the subroutine; sec/call
, the
average time in seconds of each call to the subroutine not including
those called from it; Csec/c
, the average time in
seconds of each call to the subroutine and those called from
it.
Of those, the most useful figure is %Time
,
which will tell you where your time goes. In our case, the
inner
subroutine takes the most time, so we should
try to optimize that subroutine, or find an algorithm that will call
it less. :-)
Options to dprofpp provide access to other
information or vary the way the times are calculated. You can also
make dprofpp run the script for you in the first
place, so you don't have to remember the
-d:DProf
switch:
-p
SCRIPT
Tells dprofpp that it should profile
the given SCRIPT
and then interpret
its profile data. See also -Q
.
-Q
Used with -p
to tell
dprofpp to quit after profiling the script,
without interpreting the data.
-a
Sort output alphabetically by subroutine name rather than by decreasing percentage of time.
-R
Count anonymous subroutines defined in the same package
separately. The default behavior is to count all anonymous
subroutines as one, named
main::__ANON__
.
-I
Display all subroutine times inclusive of child subroutine times.
-l
Sort by number of calls to the subroutines. This may help identify candidates for inlining.
-O
COUNT
Show only the top COUNT
subroutines. The default is 15.
-q
Do not display column headers.
-T
Display the subroutine call tree to standard output. Subroutine statistics are not displayed.
-t
Display the subroutine call tree to standard output. Subroutine statistics are not displayed. A function called multiple (consecutive) times at the same calling level is displayed once, with a repeat count.
-S
Produce output structured by the way your subroutines call one another:
main::inner x 1 0.008s main::outer x 1 0.467s = (0.000 + 0.468)s main::inner x 100 0.468s
Read this as follows: the top level of your program called
inner
once, and it ran for 0.008s elapsed
time, and the top level called outer
once and
it ran for 0.467s inclusively (0s in outer
itself, 0.468s in the subroutines called from
outer
) calling inner
100
times (which ran for 0.468s). Whew, got that?
Branches at the same level (for example,
inner
called once and
outer
called once) are sorted by inclusive
time.
-U
Do not sort. Display in the order found in the raw profile.
-v
Sort by average time spent in subroutines during each call. This may help identify candidates for hand optimization by inlining subroutine bodies.
-g
subroutine
Ignore subroutines except
subroutine
and whatever is called
from it.
Other options are described in dprofpp (1), its standard manpage.
DProf
is not your only choice for
profiler. CPAN also holds Devel::SmallProf
, which
reports the time spent in each line of your program. That can help you
figure out if you're using some particular Perl construct that is
being surprisingly expensive. Most of the built-in functions are
pretty efficient, but it's easy to accidentally write a regular
expression whose overhead increases exponentially with the size of the
input. See also Section
24.2 in Chapter 24, for
other helpful hints.
Now go take that coffee break. You'll need it for the next chapter.