gprof - display call-graph profile data
gprof [-abcCDlsz] [-e function-name] [-E function-name]
[-f function-name] [-F function-name]
[image-file [profile-file...]]
[-n number of functions]
The gprof utility produces an execution profile of a program. The effect
of called routines is incorporated in the profile of each caller. The profile
data is taken from the call graph profile file that is created by programs
compiled with the -xpg option of cc(1), or by the -pg
option with other compilers, or by setting the LD_PROFILE environment
variable for shared objects. See ld.so.1(1). These compiler options
also link in versions of the library routines which are compiled for
profiling. The symbol table in the executable image file image-file
(a.out by default) is read and correlated with the call graph profile
file profile-file (gmon.out by default).
First, execution times for each routine are propagated along the
edges of the call graph. Cycles are discovered, and calls into a cycle are
made to share the time of the cycle. The first listing shows the functions
sorted according to the time they represent, including the time of their
call graph descendants. Below each function entry is shown its (direct)
call-graph children and how their times are propagated to this function. A
similar display above the function shows how this function's time and the
time of its descendants are propagated to its (direct) call-graph
parents.
Cycles are also shown, with an entry for the cycle as a whole and
a listing of the members of the cycle and their contributions to the time
and call counts of the cycle.
Next, a flat profile is given, similar to that provided by
prof(1). This listing gives the total execution times and call counts
for each of the functions in the program, sorted by decreasing time.
Finally, an index is given, which shows the correspondence between function
names and call-graph profile index numbers.
A single function may be split into subfunctions for profiling by
means of the MARK macro. See prof(7).
Beware of quantization errors. The granularity of the sampling is
shown, but remains statistical at best. It is assumed that the time for each
execution of a function can be expressed by the total time for the function
divided by the number of times the function is called. Thus the time
propagated along the call-graph arcs to parents of that function is directly
proportional to the number of times that arc is traversed.
The profiled program must call exit(2) or return normally
for the profiling information to be saved in the gmon.out file.
The following options are supported:
-a
Suppress printing statically declared functions. If this
option is given, all relevant information about the static function (for
instance, time samples, calls to other functions, calls from other functions)
belongs to the function loaded just before the static function in the
a.out file.
-b
Brief. Suppress descriptions of each field in the
profile.
-c
Discover the static call-graph of the program by a
heuristic which examines the text space of the object file. Static-only
parents or children are indicated with call counts of 0. Note that for
dynamically linked executables, the linked shared objects' text segments are
not examined.
-C
Demangle symbol names before printing them out.
-D
Produce a profile file
gmon.sum that represents
the difference of the profile information in all specified profile files. This
summary profile file may be given to subsequent executions of
gprof
(also with
-D) to summarize profile data across several runs of an
a.out file. See also the
-s option.
As an example, suppose function A calls function B n times
in profile file gmon.sum, and m times in profile file
gmon.out. With -D, a new gmon.sum file will be created
showing the number of calls from A to B as n-m.
-efunction-name
Suppress printing the graph profile entry for routine
function-name and all its descendants (unless they have other ancestors
that are not suppressed). More than one -e option may be given. Only
one function-name may be given with each -e option.
-Efunction-name
Suppress printing the graph profile entry for routine
function-name (and its descendants) as
-e, below, and also
exclude the time spent in
function-name (and its descendants) from the
total and percentage time computations. More than one
-E option may be
given. For example:
-E mcount -E mcleanup
is the default.
-ffunction-name
Print the graph profile entry only for routine
function-name and its descendants. More than one -f option may
be given. Only one function-name may be given with each -f
option.
-Ffunction-name
Print the graph profile entry only for routine
function-name and its descendants (as -f, below) and also use
only the times of the printed routines in total time and percentage
computations. More than one -F option may be given. Only one
function-name may be given with each -F option. The -F
option overrides the -E option.
-l
Suppress the reporting of graph profile entries for all
local symbols. This option would be the equivalent of placing all of the local
symbols for the specified executable image on the -E exclusion
list.
-n
Limits the size of flat and graph profile listings to the
top n offending functions.
-s
Produce a profile file gmon.sum which represents
the sum of the profile information in all of the specified profile files. This
summary profile file may be given to subsequent executions of gprof
(also with -s) to accumulate profile data across several runs of an
a.out file. See also the -D option.
-z
Display routines which have zero usage (as indicated by
call counts and accumulated time). This is useful in conjunction with the
-c option for discovering which routines were never called. Note that
this has restricted use for dynamically linked executables, since shared
object text space will not be examined by the -c option.
PROFDIR
If this environment variable contains a value, place
profiling output within that directory, in a file named
pid.programname. pid is the process ID and
programname is the name of the program being profiled, as determined by
removing any path prefix from the argv[0] with which the program was
called. If the variable contains a null value, no profiling output is
produced. Otherwise, profiling output is placed in the file
gmon.out.
a.out
executable file containing namelist
gmon.out
dynamic call-graph and profile
gmon.sum
summarized dynamic call-graph and profile
$PROFDIR/pid.programname
cc(1), ld.so.1(1), prof(1), exit(2),
pcsample(2), profil(2), malloc(3C), monitor(3C),
malloc(3MALLOC), attributes(7), prof(7)
Graham, S.L., Kessler, P.B., McKusick, M.K., gprof: A Call
Graph Execution Profiler Proceedings of the SIGPLAN '82 Symposium on
Compiler Construction, SIGPLAN Notices, Vol. 17, No. 6, pp.
120-126, June 1982.
Linker and Libraries Guide
If the executable image has been stripped and does not have the .symtab
symbol table, gprof reads the global dynamic symbol tables
.dynsym and .SUNW_ldynsym, if present. The symbols in the
dynamic symbol tables are a subset of the symbols that are found in
.symtab. The .dynsym symbol table contains the global symbols
used by the runtime linker. .SUNW_ldynsym augments the information in
.dynsym with local function symbols. In the case where .dynsym
is found and .SUNW_ldynsym is not, only the information for the global
symbols is available. Without local symbols, the behavior is as described for
the -a option.
LD_LIBRARY_PATH must not contain /usr/lib as a
component when compiling a program for profiling. If LD_LIBRARY_PATH
contains /usr/lib, the program will not be linked correctly with the
profiling versions of the system libraries in /usr/lib/libp.
The times reported in successive identical runs may show variances
because of varying cache-hit ratios that result from sharing the cache with
other processes. Even if a program seems to be the only one using the
machine, hidden background or asynchronous processes may blur the data. In
rare cases, the clock ticks initiating recording of the program counter may
beat with loops in a program, grossly distorting measurements. Call
counts are always recorded precisely, however.
Only programs that call exit or return from main are
guaranteed to produce a profile file, unless a final call to monitor
is explicitly coded.
Functions such as mcount(), _mcount(),
moncontrol(), _moncontrol(), monitor(), and
_monitor() may appear in the gprof report. These functions are
part of the profiling implementation and thus account for some amount of the
runtime overhead. Since these functions are not present in an unprofiled
application, time accumulated and call counts for these functions may be
ignored when evaluating the performance of an application.
64-bit profiling may be used freely with dynamically linked executables, and
profiling information is collected for the shared objects if the objects are
compiled for profiling. Care must be applied to interpret the profile output,
since it is possible for symbols from different shared objects to have the
same name. If name duplication occurs in the profile output, the module id
prefix before the symbol name in the symbol index listing can be used to
identify the appropriate module for the symbol.
When using the -s or -Doption to sum multiple
profile files, care must be taken not to mix 32-bit profile files with
64-bit profile files.
32-bit profiling may be used with dynamically linked executables, but care must
be applied. In 32-bit profiling, shared objects cannot be profiled with
gprof. Thus, when a profiled, dynamically linked program is executed,
only the main portion of the image is sampled. This means that all time
spent outside of the main object, that is, time spent in a shared
object, will not be included in the profile summary; the total time reported
for the program may be less than the total time used by the program.
Because the time spent in a shared object cannot be accounted for,
the use of shared objects should be minimized whenever a program is profiled
with gprof. If desired, the program should be linked to the profiled
version of a library (or to the standard archive version if no profiling
version is available), instead of the shared object to get profile
information on the functions of a library. Versions of profiled libraries
may be supplied with the system in the /usr/lib/libp directory. Refer
to compiler driver documentation on profiling.
Consider an extreme case. A profiled program dynamically linked
with the shared C library spends 100 units of time in some libc
routine, say, malloc(). Suppose malloc() is called only from
routine B and B consumes only 1 unit of time. Suppose further
that routine A consumes 10 units of time, more than any other routine
in the main (profiled) portion of the image. In this case,
gprof will conclude that most of the time is being spent in A
and almost no time is being spent in B. From this it will be almost
impossible to tell that the greatest improvement can be made by looking at
routine B and not routine A. The value of the profiler in this
case is severely degraded; the solution is to use archives as much as
possible for profiling.
Parents which are not themselves profiled will have the time of their profiled
children propagated to them, but they will appear to be spontaneously invoked
in the call-graph listing, and will not have their time propagated further.
Similarly, signal catchers, even though profiled, will appear to be
spontaneous (although for more obscure reasons). Any profiled children of
signal catchers should have their times propagated properly, unless the signal
catcher was invoked during the execution of the profiling routine, in which
case all is lost.