Hi,
I have come to the conclusion that using -O2 is beneficial even for DBG = 1 builds, and that it should be set on by default on all builds. The typically given reason for not using optimizations on a "Debug" build is because these apparently make assembly code harder to read. I have realized otherwise, and as seen in the example that I will include below, I'm sure this will be mutually agreed on. I note the following advantages in using -O2 on a DBG = 1 build as well:
- -O2 makes the compiler do additional checks. For example, gcc will NOT detect uninitialized variables unless -O2 is being used, even though they are a very important programming bug. Apart from finding more bugs, it also makes trunk compilable. Right now, I see at least two commits by Thomas or others being made every week in order to fix some code which used unitinialized variables (I myself have been guilty of this). This means that some of us, like Thomas, have to constantly fix other people's mistakes. - -O2 means less last-minute blockers. Because we release in -O2 but almost never build it like that, this creates a big problem for people like Andrew or Brandon, which handle the release process and do testing. Because the -O2 build gets less testing coverage, it is very possible for a critical bug to be in ROS for a month before anyone notices it at release time, in which case we will all have to scramble to find a fix for it. - -O2 will not undefine DBG or change anything else in the code. All the advatanges, extra error checking and assertions of the DBG =1 build would remain. - -O2 builds are much faster, greatly helping testing speed. - -O2 builds are much more likely to bring up race conditions and other important timing bugs we need to watch out for. - -O2 means easier debugging. This point is really important because until I realized how true it was, I didn't want to bring this up. Here is a pseudo(but real) disassembly of something I've seen in my dbg = 1 kernel binary while debugging:
0x40b845: push ebp mov ebp, esp sub esp, 4 mov [ebp-4], fs:18h mov eax, [ebp-4] leave retn
0x4bc8a5: push ebp mov ebp, esp sub esp, 4 call 0x40b845 mov ecx, [eax+1c] mov [ebp-4], eax mov eax, [ebp-4] leave retn
0x42b845: push ebp mov ebp, esp sub esp, 4 call 0x4bc8a5 mov ecx, [eax+124] mov [ebp-4], eax mov eax, [ebp-4] leave retn
KeFooBar: push ebp mov ebp, esp sub esp, 4c call 0x42b845 mov [ebp-0xc], eax mov eax, [ebp-0xc] <..> leave retn
This is how it looks with -O2
KeFooBar: push ebp mov ebp, esp sub esp, 4c mov eax, fs:124h <..> leave retn
I hope we can all agree on which one of these is readable. The -O2 build clearly shows you that eax is fs:124h, which you oughta know is Pcrb->CurrentThread; even if you don't, you can easily check in a header. The non-o2 build calls 3 other functions, out of which 2 are merely calling other functions themselves (due to lack of symbols you have no way of knowing what these functions are doing), until we finally get to a function which does fs:18, which you then realize is the PCR, you then walk back and realize pcr->0x1c is PCRB, and Prcb->0x124 is current thread.
Yes, this example could easily be destroyed by saying " use a #define with inline assembly" but I can bring many more; we can't start using inline assembly everywhere... msvc does an amazing job at optimizing these things, and even gcc isn't that bad, if only you let it. Code built without -o2 makes horrible usage of the stack, which makes you have to memory a lot more addresses then code which simple stores values in registers. Because humans are smart, the loops generated by -O2 are also much closer to what someone that understands assembly is used to (for example, the loop will use ecx, and not a stack variable that you need to memorize). I consider myself an expert on assembly coding, and I simply have great trouble reading non-O2 kernels, so how exactly does it help debugging?
In the end, I am convinced that the only disadvantage of using -O2 by default is that it will slightly increase build times. I don't think this increase is more then, at most 1 minute or two for a complete build. If this issue is really critical to someone people, then perhaps only core system files should use -O2 (kernel32, ntdll, ntoskrnl, csr, win32k, drivers, etc).
I know some of the developers on IRC are strongly for this, but I want to make sure I get a broader opinion.
Best regards, Alex Ionescu