Ge van Geldorp wrote:
In anticipation of doing some work on performance
issues, I've been
experimenting with our kernel profiler the last few days. For those who
don't know, you can activate the profiler by including the /PROFILE option
on the command line. On each timer interrupt the current EIP will be saved,
after 30 sec a background task will convert the saved EIPs to function
name/file name, count how often each function name occurs and write the
results to a file %system_root%\profiler.log. There are 100 timer ticks per
second, a 30 sec measuring interval will therefore generate 3000 EIPs.
This works very nice, but I'm running into one problem on my 300MHz test
machine: conversion of the EIPs to function name by the background thread
takes about 60 sec. Almost all of that time is spent looking up the correct
.stabs entry from the info in the .sym file. This is done using a linear
search. Since there can be 100000 .stabs entries in a .sym file, going
through these for each of the 3000 measurements can mean 300000000 compares.
Even worse, two passes are made through the .stabs list for each
measurement, one to find the function name and one to find the file name. No
wonder this takes 60 sec.
So, we need to improve the search. I've done some simulations, and when I
use a binary search instead of a linear search, the performance is
dramatically better. Time required to convert all measurements drops from 60
sec to 0.06 sec, a 1000 fold improvement.
To be able to use a binary search, the information in the .sym file needs to
be sorted and uniform (each element contains the same type of data) which it
currently isn't. We have .stabs entries defining a function (N_FUN), source
line (N_SLINE) and source file (N_SO). Currently, we only use the .sym files
to convert an address to function name/file name/source line, not the other
way around (no function name to address for example). I'd like to drop using
the .stab format in the .sym files and change to the following format:
+----------+
|header |
+----------+
|symentries|
+----------+
|strings |
+----------+
The header would just contain the start location and number of symentries
plus start location and total size of the strings. Each symentry would have
the following info:
typedef struct tagSYM_ENTRY {
ULONG_PTR Address;
ULONG FunctionOffset;
ULONG FileOffset;
ULONG SourceLine;
} SYM_ENTRY, *PSYM_ENTRY;
where Address is an address relative to the beginning of the module,
FunctionOffset and FileOffset are offsets from the beginning of the strings
section and SourceLine contains the source line number. The symentries are
sorted by Address (done by rsym). Each Address would only appear once, and
information would be made as complete as possible (e.g. when creating the
symentries from the .stabs in the .nostrip.exe file each FunctionOffset
would be set to point to the function name from the most recently
encountered N_FUN .stab). This will allow us to retrieve all 3 pieces of
information (function name, file name and source line) by doing just a
single binary search.
Comments, thoughts, objections?
Gé van Geldorp.
The only reason I didn't do something like this was because I was
worried about the effect on build-time, which at the moment is already
horrendous. If this sorting can be done quickly ( i.e. have minimal
effect on build time), then I think it's a great idea.
Also, how much larger do you anticipate the sym files getting, and do we
have the kmode address space available to devote to this increase in
size with room to spare?
I apologize if some of these questions are stupid ;)