Hi Timo,
Regarding #1, we actually have this working in our tree. It was required for the new
SYSENTRY C handler. So thank you for the advice and for pointing that out. Unfortunately I
will not credit you for the idea since the code was already written, I hope this is
understandable. We had to do this not for performance, but for compatibility with the Eoi
helper function which is exported. The performance difference between pushad and doing the
instructions manually (and later fixing up) was actually not that big, since pushad has
certain pipeline advantages. However, when dealing with interrupts, it starts becoming
more of a problem (although that code is not in mainline yet). Also your sample code has
some bugs: the last integer register save is duplicated, and it would be better to use a
symbolic name instead of 10 * 4. That value is also wrong, since it depends on whether or
not the trap generates an error code or not. x86 is not easy, have to be careful!
Regarding #2, I unfortunately do not see the need to add more assembly when it is not
needed. There are no subtle compiler breakages involved, since it is impossible for the
compiler to add a DS/ES access for no reason at all (can you come up with an example?),
this is part of the contract the compiler has to make (if compilers did this, it would
break a whole class of embedded and real-mode software). Because the stack is safe, I
think it's worth avoiding the extra lines of assembly. The C inlines will produce
exactly the same code anyway. Also, it's not as easy as you make it seem, since for a
FAST-V86 trap, we don't want to save the registers, just change them. So we need to
insert a check for EFLAGS to see if the V8086 flag is there or not, then conditionally do
the registers, then jump to the C code that will do another conditional check, and pick
the right trap entry C code... it gets quite complicated. Since there are no
"real" issue with doing it in C, other than personal choice and style, I think
we'll stick with what we have.
The patch for #1 will also start introducing some other performance enhancing features,
such as branch prediction hints for the CPU and compiler, and other little boosts. With
the pusha conversion code gone, the C code is actually more efficient than previous
assembly implementations.
Thanks for your comments and suggestions, please keep them coming. Do not be discouraged
by this particular decision regarding #2 (it's also slightly political) as a sign or
pattern that any suggestion will be ignored (you were not the first to bring up #1,
otherwise we'd have done it thanks to you).
Also, if anyone has any experience with MSC++ and could port the GCC-centric macros, that
would be much appreciated. I've given up on MSC++ after the compiler and linker have
tried, year after year, to add more and more shit to what used to be a simple portable
codebase. Buffer this, buffer that, nx this, dep that, safe seh here, hotpatch there,
manifest files to even get it to run... ARGH!
Finally, the recent fixes have solved the double fault problem, but it seems certain
flavours of VMWare use a different Video BIOS than my test machines and that V8086 C
emulation code must be hitting a bug. It could also be we're seeing the BOP and not
handling it. I'll look into this today. Note that the ASM implementation of PUSHF (or
was it POPF) was completely broken. Someone must've made some serious typos when
*ahem* writing the code. Hopefully there are no dependencies on that bug.
-r
Hi,
I suggest the following changes to the current implementation:
1.) Replace the PUSHA in the trap entry code, by a series of MOVs to the
correct stack positions. The same for the trap exit code.
KiTrapFrameFromPushaStack can be removed then. This would result in more
clear, less complex and faster code, with only a few additional
assembly instructions in the trap entry macro. The exit could be done
either by normal call/return or by a jmp to an exit handler.
2.) Segements should be fixed up before entering C code, as not doing so
may introduce possible compiler dependend breakages. It's also much
cleaner and there's no reason to do the same stuff later in inline
assembly instead of direcly in the asm entry point.
The resulting code might looks something like this:
/* Allocate KTRAP_FRAME */
sub esp, KTRAP_FRAME_LENGTH - 10 * 4
/* Save integer registers */
mov [esp + KTRAP_FRAME_EBP], ebp
mov [esp + KTRAP_FRAME_EBX], ebx
mov [esp + KTRAP_FRAME_ESI], esi
mov [esp + KTRAP_FRAME_EDI], edi
mov [esp + KTRAP_FRAME_EAX], eax
mov [esp + KTRAP_FRAME_ECX], ecx
mov [esp + KTRAP_FRAME_EDX], edx
mov [esp + KTRAP_FRAME_EBX], ebx
/* Save segment regs */
mov [esp + KTRAP_FRAME_SEGDS], ds
mov [esp + KTRAP_FRAME_SEGES], es
/* Fixup segment regs */
mov ax, KGDT_R3_DATA | RPL_MASK
mov ds, ax
mov es, ax
Timo
_______________________________________________
Ros-dev mailing list
Ros-dev(a)reactos.org
http://www.reactos.org/mailman/listinfo/ros-dev