Optimization Proposal - Ros-dev

4 Jan 2006

Hi,
I have come to the conclusion that using -O2 is beneficial even for DBG
= 1 builds, and that it should be set on by default on all builds. The
typically given reason for not using optimizations on a "Debug" build is
because these apparently make assembly code harder to read. I have
realized otherwise, and as seen in the example that I will include
below, I'm sure this will be mutually agreed on. I note the following
advantages in using -O2 on a DBG = 1 build as well:
- -O2 makes the compiler do additional checks. For example, gcc will NOT
detect uninitialized variables unless -O2 is being used, even though
they are a very important programming bug. Apart from finding more bugs,
it also makes trunk compilable. Right now, I see at least two commits by
Thomas or others being made every week in order to fix some code which
used unitinialized variables (I myself have been guilty of this). This
means that some of us, like Thomas, have to constantly fix other
people's mistakes.
- -O2 means less last-minute blockers. Because we release in -O2 but
almost never build it like that, this creates a big problem for people
like Andrew or Brandon, which handle the release process and do testing.
Because the -O2 build gets less testing coverage, it is very possible
for a critical bug to be in ROS for a month before anyone notices it at
release time, in which case we will all have to scramble to find a fix
for it.
- -O2 will not undefine DBG or change anything else in the code. All the
advatanges, extra error checking and assertions of the DBG =1 build
would remain.
- -O2 builds are much faster, greatly helping testing speed.
- -O2 builds are much more likely to bring up race conditions and other
important timing bugs we need to watch out for.
- -O2 means easier debugging. This point is really important because
until I realized how true it was, I didn't want to bring this up. Here
is a pseudo(but real) disassembly of something I've seen in my dbg = 1
kernel binary while debugging:
0x40b845:
push ebp
mov ebp, esp
sub esp, 4
mov [ebp-4], fs:18h
mov eax, [ebp-4]
leave
retn
0x4bc8a5:
push ebp
mov ebp, esp
sub esp, 4
call 0x40b845
mov ecx, [eax+1c]
mov [ebp-4], eax
mov eax, [ebp-4]
leave
retn
0x42b845:
push ebp
mov ebp, esp
sub esp, 4
call 0x4bc8a5
mov ecx, [eax+124]
mov [ebp-4], eax
mov eax, [ebp-4]
leave
retn
KeFooBar:
push ebp
mov ebp, esp
sub esp, 4c
call 0x42b845
mov [ebp-0xc], eax
mov eax, [ebp-0xc]
<..>
leave
retn
This is how it looks with -O2
KeFooBar:
push ebp
mov ebp, esp
sub esp, 4c
mov eax, fs:124h
<..>
leave
retn
I hope we can all agree on which one of these is readable. The -O2 build
clearly shows you that eax is fs:124h, which you oughta know is
Pcrb->CurrentThread; even if you don't, you can easily check in a
header. The non-o2 build calls 3 other functions, out of which 2 are
merely calling other functions themselves (due to lack of symbols you
have no way of knowing what these functions are doing), until we finally
get to a function which does fs:18, which you then realize is the PCR,
you then walk back and realize pcr->0x1c is PCRB, and Prcb->0x124 is
current thread.
Yes, this example could easily be destroyed by saying " use a #define
with inline assembly" but I can bring many more; we can't start using
inline assembly everywhere... msvc does an amazing job at optimizing
these things, and even gcc isn't that bad, if only you let it. Code
built without -o2 makes horrible usage of the stack, which makes you
have to memory a lot more addresses then code which simple stores values
in registers. Because humans are smart, the loops generated by -O2 are
also much closer to what someone that understands assembly is used to
(for example, the loop will use ecx, and not a stack variable that you
need to memorize). I consider myself an expert on assembly coding, and I
simply have great trouble reading non-O2 kernels, so how exactly does it
help debugging?
In the end, I am convinced that the only disadvantage of using -O2 by
default is that it will slightly increase build times. I don't think
this increase is more then, at most 1 minute or two for a complete
build. If this issue is really critical to someone people, then perhaps
only core system files should use -O2 (kernel32, ntdll, ntoskrnl, csr,
win32k, drivers, etc).
I know some of the developers on IRC are strongly for this, but I want
to make sure I get a broader opinion.
Best regards,
Alex Ionescu