Congrats on once again breaking another key part of Windows compatibility.
This function is DOCUMENTED to tear. It was DESIGNED that way on purpose.
Utter failure, once more.
Best regards, Alex Ionescu
On Sat, May 14, 2011 at 12:29 PM, tkreuzer@svn.reactos.org wrote:
Author: tkreuzer Date: Sat May 14 19:29:42 2011 New Revision: 51748
URL: http://svn.reactos.org/svn/reactos?rev=51748&view=rev Log: [NTOSKRNL] Patch by Paolo Bonzini <bonzini [at] gnu [dot] org> Fix implementation of ExInterlockedAddLargeStatistic The old version wasn't really atomic.
See issue #6223 for more details.
Modified: trunk/reactos/ntoskrnl/ex/i386/fastinterlck_asm.S
Modified: trunk/reactos/ntoskrnl/ex/i386/fastinterlck_asm.S URL: http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/ex/i386/fastinterl...
============================================================================== --- trunk/reactos/ntoskrnl/ex/i386/fastinterlck_asm.S [iso-8859-1] (original) +++ trunk/reactos/ntoskrnl/ex/i386/fastinterlck_asm.S [iso-8859-1] Sat May 14 19:29:42 2011 @@ -4,8 +4,9 @@
- FILE: ntoskrnl/ex/i386/fastinterlck_asm.S
- PURPOSE: FASTCALL Interlocked Functions
- PROGRAMMERS: Alex Ionescu (alex@relsoft.net)
- */
Paolo Bonzini <bonzini [at] gnu [dot] org>- */
/* INCLUDES ******************************************************************/
#include <asm.inc> @@ -31,25 +32,55 @@ */ PUBLIC @ExInterlockedAddLargeStatistic@8 @ExInterlockedAddLargeStatistic@8:
-#ifdef CONFIG_SMP
- /* Do the addition */
- lock add [ecx], edx
- /* Check for carry bit and return */
- jb .l1
- ret
-.l1:
- /* Add carry */
- lock adc dword ptr [ecx+4], 0
-#else
- /* Do the addition and add the carry */
- add dword ptr [ecx], edx
- adc dword ptr [ecx+4], 0
-#endif
- /* Return */
- ret
- push ebp
- push ebx
- mov ebp, ecx
+Again:
- /* Load comparand in eax for cmpxchg */
- mov eax, [ebp]
- /* Compute low word of the result in ebx */
- mov ebx, edx
- add ebx, eax
- /* Carry needs cmpxchg8b */
- jc Slow
- /* Fast path still needs to be atomic, so use cmpxchg and retry if it
fails
* Hopefully it will still get through this path :) */- LOCK cmpxchg [ecx], ebx
- jnz Again
- /* Thats it */
- pop ebx
- pop ebp
- ret
+Slow:
- /* Save increment across cmpxchg8b */
- push edx
- /* Finish loading comparand in edx:eax */
- mov edx, [ebp+4]
- /* Result in ecx:ebx (we know there's carry) */
- lea ecx, [edx+1]
- /* Do a full exchange */
- LOCK cmpxchg8b [ebp]
- /* restore increment */
- pop edx
- /* Need to retry */
- jnz Again
- /* Thats it */
- pop ebx
- pop ebp
- ret
/*ULONG *FASTCALL
Am 15.05.2011 00:06, schrieb Alex Ionescu:
Congrats on once again breaking another key part of Windows compatibility.
Sorry for my ignorance. Could you please enlighten me and explain how code that behaves identically, except in very, very rare cases, where it doesnt fail miserably like the old version, and which might be a tiny piece slower, will break compatibility. Do drivers expect the function to fail randomly? Is the function time critical? Do snakes have legs? Whats the purpose of life?
This function is DOCUMENTED to tear. It was DESIGNED that way on purpose.
I see, "Its not a bug, its a feature"(tm)
Utter failure, once more.
Of course, "As usual"(tm)
Best regards, Alex Ionescu
Look, I don't understand why this is so hard for the project to understand:
Certain Windows functions are BROKEN. Certain Windows algorithms make NO SENSE. Other things are BADLY OPTIMIZED. Other things make WRONG ASSUMPTIONS.
But these set of behaviors and design failures are what billions of customers and pieces of software DEPEND ON. Microsoft cannot change the STACK LAYOUT of IE due to fears that BHOs which look up the file handle up the stack will break. They have SHIMS that LIE about the stack order to make those BHOs work, even as the stack changes.
Microsoft, on this function, writes,
"Remarks ------------------------------
This function is not atomic because it is implemented as two separate locked instructions. An atomic 64-bit read that occurs on another thread during the execution of this intrinsic could result in an inconsistent value being read. "
Can you claim, with absolute CERTAINTY that NOBODY in the WORLD depends on this behavior? That it's two separate intrusctions, and that an inconsistent value can be read? How do you know certain VMs don't look for this code sequence, how do you know security/emulation software doesn't abuse/utilise this? ETC ETC ETC. It's such a simple/short sequence that it's easy to recognize.
As a simple example, HexRays recognizes the particular ASM sequence as InterlockedAddLargeStatistic. Now, in ReactOS, it won't.
What's worse, is that recent Windows kernels define ExInterlocked.... to the compiler intrinsic _Interlocked. Do you suggest to also implement the intrinsic differently, which is in-lined? So now you will inline those 50 lines of (perfect, yes) code in every caller? These functions are called at ISR level usually -- great idea!
Or maybe you keep the (broken) MS implementation in the intrinsic, but use this one for the export? Great, now you introduce two sets of behaviors, and add even more non-determinism.
Do you really believe Microsoft is stupid and didn't realize this? That they didn't change things around just because they forgot? No, it's 20 years too late to change anything, and if you start doing this in ReactOS, you are certain to fail.
Please leave all personal attacks and politics aside, tell me where I'm technically wrong. You cannot prove this better implementation won't break things. You have the compiler intrinsic to worry about. I named at least one piece of software that depends on it. I named critical speed/code size issues with this function. What more is needed to prove the point?
Best regards, Alex Ionescu
On Sat, May 14, 2011 at 4:07 PM, Timo Kreuzer timo.kreuzer@web.de wrote:
Am 15.05.2011 00:06, schrieb Alex Ionescu:
Congrats on once again breaking another key part of Windows compatibility.
Sorry for my ignorance. Could you please enlighten me and explain how code that behaves identically, except in very, very rare cases, where it doesnt fail miserably like the old version, and which might be a tiny piece slower, will break compatibility. Do drivers expect the function to fail randomly? Is the function time critical? Do snakes have legs? Whats the purpose of life?
This function is DOCUMENTED to tear. It was DESIGNED that way on purpose.
I see, "Its not a bug, its a feature"(tm)
Utter failure, once more.
Of course, "As usual"(tm)
Best regards, Alex Ionescu
Ros-dev mailing list Ros-dev@reactos.org http://www.reactos.org/mailman/listinfo/ros-dev
Am 15.05.2011 01:49, schrieb Alex Ionescu:
Look, I don't understand why this is so hard for the project to understand:
Certain Windows functions are BROKEN. Certain Windows algorithms make NO SENSE. Other things are BADLY OPTIMIZED. Other things make WRONG ASSUMPTIONS.
But these set of behaviors and design failures are what billions of customers and pieces of software DEPEND ON. Microsoft cannot change the STACK LAYOUT of IE due to fears that BHOs which look up the file handle up the stack will break. They have SHIMS that LIE about the stack order to make those BHOs work, even as the stack changes.
Software depends on behaviour. If 2 functions behave the same, then its safe to replace one with another. This is a very basic concept that ignores assembly code sequences etc. But if we couldn't rely on this concept the whole project is doomed to fail anyway, unless we copy most of MS code on assembly level, since we have no chance of evaluating where we can safely differ from MS code and where we can't.
Microsoft, on this function, writes,
" Remarks
This function is not atomic because it is implemented as two separate locked instructions. An atomic 64-bit read that occurs on another thread during the execution of this intrinsic could result in an inconsistent value being read.
"
I don't know where you have that from, but current MSDN doesn't show that remark and google gives no results either. So you can safely assume that the majority of people will not know about it. So all their code might depend on is that the function works correctly as descibed, not the opposite.
Can you claim, with absolute CERTAINTY that NOBODY in the WORLD depends on this behavior? That it's two separate intrusctions, and that an inconsistent value can be read? How do you know certain VMs don't look for this code sequence, how do you know security/emulation software doesn't abuse/utilise this? ETC ETC ETC. It's such a simple/short sequence that it's easy to recognize.
As a simple example, HexRays recognizes the particular ASM sequence as InterlockedAddLargeStatistic. Now, in ReactOS, it won't.
Extending this argument, we couldn't succeed, unless we would copy almost every piece of code on assembly level. Implementing ONE function identically doesn't solve this problem. How can we know that certain kind of software doesn't depend on MSVC being used as a compiler? Or that certain instructions appear on the beginning of a function? What about a particular call chain being detected? What about symbols from the MS server? HexRays probably recognizes a lot of things. Because that's its job. But find me a driver that won't run, because this function is not broken.
What's worse, is that recent Windows kernels define ExInterlocked.... to the compiler intrinsic _Interlocked. Do you suggest to also implement the intrinsic differently, which is in-lined? So now you will inline those 50 lines of (perfect, yes) code in every caller? These functions are called at ISR level usually -- great idea!
Or maybe you keep the (broken) MS implementation in the intrinsic, but use this one for the export? Great, now you introduce two sets of behaviors, and add even more non-determinism.
No, it doesn't. Except the assembly sequence everything is fine. One intrinsic (broken) function and one correctly implemented function being executed at the same time on the same memory would work very well together. The only non-deterministic behaviour comes from the broken implementation (running at the same time with another broken implementation).
Do you really believe Microsoft is stupid and didn't realize this? That they didn't change things around just because they forgot? No, it's 20 years too late to change anything, and if you start doing this in ReactOS, you are certain to fail.
When the behaviour is "known" to be broken and must be that way, then why didn't the comments in the function say that? You obviously like to document every single assembly opcode to describe what it does, while every developer who can read assembly knows that anyway, but you forget to document the only and most important information, which is that this function behaves in this broken way on Windows. Maybe that was at a time where you weren't really capable of understanding such issues? When you "wrote" the code, did you even notice the problem?
Please leave all personal attacks and politics aside, tell me where I'm technically wrong. You cannot prove this better implementation won't break things. You have the compiler intrinsic to worry about. I named at least one piece of software that depends on it. I named critical speed/code size issues with this function. What more is needed to prove the point?
Anything that provides substance maybe?
Anyway, I'll leave it to Aleksey to decide if the function needs to be identical or not. I'm not going to waste any more time discussing this completely retarded and irrelevant issue.
Thanks, Timo
"Maybe that was at a time where you weren't really capable of understanding such issues? When you "wrote" the code, did you even notice the problem?"
Since you were unable to formulate a response without any sort of personal attack, it's obviously impossible to try to have a rational discussion with you, so I give up.
You've really become me, 6 years ago. Congratulations, I guess.
Best regards, Alex Ionescu
On Sun, May 15, 2011 at 2:52 AM, Timo Kreuzer timo.kreuzer@web.de wrote:
Maybe that was at a time where you weren't really capable of understanding such issues? When you "wrote" the code, did you even notice the problem?